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Abstract 

We study Byzantine fault-tolerant distributed optimization of a sum of convex (cost) functions 
with real-valued scalar input/ouput. In particular, the goal is to optimize a global cost function 
where Af is the set of non-faulty agents, and hi(x) is agent V s local cost function, 
which is initially known only to agent i. In general, when some of the agents may be Byzantine 
faulty, the above goal is unachievable, because the identity of the faulty agents is not necessarily 
known to the non-faulty agents, and the faulty agents may behave arbitrarily. Since the above global 
cost function cannot be optimized exactly in presence of Byzantine agents, we define a weaker ver¬ 
sion of the problem. 

The goal for the weaker problem is to generate an output that is an optimum of a function 
formed as a convex combination of local cost functions of the non-faulty agents. More precisely, for 
some choice of weights ccj for i G JV such that ctj > 0 and = 1) the output must be an 

optimum of the cost function &ihi(x). Ideally, we would like ^ for alH 6 AT - however, 

this cannot be guaranteed due to the presence of faulty agents. In fact, we show that the maximum 
achievable number of nonzero weights (ctj’s) is \AT\ — /, where / is the upper bound on the number 
of Byzantine agents. In addition, we present algorithms that ensure that at least \Af \ — f agents 
have weights that are bounded away from 0. A low-complexity suboptimal algorithm is proposed, 
which ensures that at least |"I|] — (j) agents have weights that are bounded away from 0, where n is 
the total number of agents, and (j> ((j> < f) is the actual number of Byzantine agents. 


* This research is supported in part by National Science Foundation awards NSF 1329681 and 1421918. Any opinions, 
findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect 
the views of the funding agencies or the U.S. government. 
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1 System Model and Problem Formulation 

The system under consideration is synchronous, and consists of n agents connected by a complete 
communication network. The set of agents is V = {1, • • • ,n}. We assume that n > 3/ for reasons 
that will be clearer soon. We say that a function h : R — > R is admissible if (i) h(-) is convex, 
and continuously differentiable, and (ii) the set argmin^R h{x) containing the optima of h(-) is 
non-empty and compact (i.e., bounded and closed). Each agent i £ V is initially provided with an 
admissible local cost function hi : M — y R. 

Up to / of the n agents may be Byzantine faulty. Let T denote the set of faulty agents, and let 
M = V — T denote the set of non-faulty agents. The set fF of faulty agents may be chosen by an 
adversary arbitrarily. Let IJ 7 ! = cj). Note that <f> < f and |A f\ >n — f. 

The ideal goal here is to develop algorithms that optimize the average of the local cost functions 
at the non-faulty agents, and allow the non-faulty agents to reach consensus on an optimum x. Thus, 
ideally, as stated in Problem 1 in Figure 1, each non-faulty agent should output an identical value 
iGffi that minimizes ^ Yi&M hi ( x ) • 


Problem 1 


Problem 2 


Problem 3 with parameters /3, 7 , /3 > 0 


x G arg min 
a:€M 


1 

M 


Y hi (*) 

ieM 


x G arg min 

16 R 


such that 


Y aahi(x) 

ieM 


Mi G N, cm > 0 and 


Y ai = 1 

ieJV 


x G arg min 

a:€R 


such that 


Y a ihi( X ) 

ieM' 


Mi G A f, an > 0, 

Yj a i = li an( i 

ieJ\T 

Y !(ai > P) > 7 

ieW 


Fig. 1: Problem formulations: All non-faulty agents must output an identical value x € M that 
satisfies the constraints specified in each problem formulation. 


The presence of Byzantine faulty nodes makes it impossible to design an algorithm that can 
solve Problem 1 for all admissible local cost functions (this is shown formally in Appendix B). 
Therefore, we introduce a weaker version of the problem, namely, Problem 2 in Figure 1. Prob¬ 
lem 2 requires that the output x be an optimum of a function formed as a convex combination 
of local cost functions of the non-faulty agents. More precisely, for some choice of weights a* for 
i € Af such that ctj > 0 and Yi^j\f ai = output must be an optimum of the weighted cost 

function Yi&M a ihi{x). Ideally, we would like a* = ^ for all i G Af, since that would effectively 
solve Problem 1. However, as noted above, this cannot be guaranteed due to the presence of faulty 
agents. Therefore, in general, all af s may not necessarily be non-zero for the chosen solution of 
Problem 2. The desired goal then is to maximize the number of weights (cci’s) that are bounded 
away from zero. With this in mind, we introduce the third, and final problem formulation (Problem 
3) in Figure 1. In Problem 3, note that l{a* > (3} is an indicator function that outputs 1 if ct* > /?, 
and 0 otherwise. Essentially, Problem 3 adds a constraint to Problem 2, requiring that at least 7 
weights must exceed a threshold (3, where (3 > 0. Thus, (3 ,7 are parameters of Problem 3. 

We will say that Problem 1, 2 or 3 is solvable if there exists an algorithm that will find a 
solution for the problem (satisfying all its constraints) for all admissible local cost functions, and 
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all possible behaviors of faulty nodes. Our problem formulations require that all non-faulty agents 
output identical while satisfying the constraints imposed by the problem (as listed in Figure 

1). Thus, the traditional Byzantine consensus [ 8 ] problem, which also imposes a similar agreement 
condition, is a special case of our optimization problem . 2 Therefore, the lower bound of n > 3/ for 
Byzantine consensus [ 8 ] also applies to our problem. Hence we assume that n > 3/. 

We prove the following key results: 

— (Theorem 1) Problem 1 is not solvable when / > 0. 

— (Theorem 2) For any f3 > 0, Problem 3 is not solvable if 7 > |A/"| — /. 

— (Theorems 3, 6 , and 10) Problem 3 is solvable with (3 < 2 ([a/|-/) an< ^ 7 — 1-^1 — /• 


Our results in Theorems 3 and 6 can be strengthened to (3 < i, as discussed later. 

In our other work we explore subdifferentiable functions, restricted functions families, and asyn¬ 
chronous systems, respectively. Results will be presented in other reports. 

The rest of the report is organized as follows. Related work is summarized in Section 2. Impos¬ 
sibility results, in particular, Theorem 1 and Theorem 2 are presented in Section 3. Achievability of 
7 = |jV| — / is proved constructively in Section 4 wherein five algorithms are proposed. In particu¬ 
lar, Algorithms 1, 2, 3 and 5 solve Problem 3 with (3 = 2 (W\^f) anc ^ 7 = l-^l — /> an d Algorithm 
4 solves Problem 3 with /3 = and 7 = n — 2/. The performance of Algorithm 4 is indepen¬ 

dent of \M\ and is, in general, slightly weaker than the performance of Algorithms 1, 2, 3 and 5. 
Alternative performance analysis of Algorithms 1, 2 and 3 is also presented - Algorithms 1, 2 and 
3 are shown to solve Problem 3 with /? = 7 and 7 = |W| — /. Section 5 presents a low-complexity 
suboptimal algorithm that solves Problem 3 with (3 = and 7 = \'if\ — 4>. Our other technical 
reports under preparation that extend the results presented in this report are briefly discussed in 
Section 6 . Section 7 concludes the report. 

2 Related Work 

Fault-tolerant consensus [19] is a special case of the optimization problem considered in this report. 
There is a significant body of work on fault-tolerant consensus, including [5,4,15,7,12,22,9]. The 
optimization algorithms presented in this report use Byzantine consensus as a component. 

Convex optimization, including distributed convex optimization, also has a long history [1]. 
However, we are not aware of prior work that obtains the results presented in this report. Pri¬ 
mal and dual decomposition methods that lend themselves naturally to a distributed paradigm 
are well-known [2]. There has been significant research on a variant of distributed optimization 
problem [6,16,21], in which the global objective h{x) is a summation of n convex functions, i.e, 
h(x) = hj( x ), with function hj(x) being known to the j-th agent. The need for robustness for 

distributed optimization problems has received some attentions recently [6,10,23,14]. In particular, 
Duchi et al. [ 6 ] studied the impact of random communication link failures on the convergence of 

2 This can be proved formally as follows. Suppose that we want to solve the Byzantine consensus problem where 
the input of agent i is at € {0,1}. Then defining fi(x) = (x — a,i ) 2 ensures that the output x of correct algorithms 
for Problems 1, 2, 3 will be in the convex hull of the inputs at the non-faulty agents. Choose [a;] as the output for 
the consensus problem. 
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distributed variant of dual averaging algorithm. Specifically, each realizable link failure pattern con¬ 
sidered in [ 6 ] is assumed to admit a doubly-stochastic matrix which governs the evolution dynamics 
of local estimates of the optimum. 

In other related work, significant attempts have been made to solve the problem of distributed 
hypothesis testing in the presence of Byzantine attacks [10,23,14], where Byzantine sensors may 
transmit fictitious observations aimed at confusing the decision maker to arrive at a judgment that 
is in contrast with the true underlying distribution. Consensus based variant of distributed event 
detection, where a centralized data fusion center does not exist, is considered in [10]. In contrast, 
in this paper, we focus on the Byzantine attacks on the multi-agent optimization problem. 

3 Impossibility Results 

Recall that we say that Problem i (i = 1, 2,3) is solvable if there exists an algorithm that will find 
a solution for the problem (satisfying all its constraints) for all admissible local cost functions, and 
all possible behaviors of faulty nodes. The intuitive result below, proved in Appendix B, shows that 
there is no solution for Problem 1 in presence of faulty agents. 

Theorem 1. Problem 1 is not solvable when f > 0. 

Theorem 2 below presents an upper bound on parameter 7 for Problem 3 to be solvable. 
Theorem 2. For any /3 > 0, Problem 3 is not solvable if 7 > \Af\ — f. 

Appendix C presents the proof. In the next section, we will show that the upper bound of Theorem 
2 is achievable. 

4 Proposed Algorithms 

In this section, we present five different algorithms. The first two algorithms, named Algorithm 1 
and Algorithm 2, respectively, are not necessarily practical, but allow us to derive results that are 
useful in proving the correctness of Algorithm 3, Algorithm 4 and Algorithm 5, which are more 
practical. As an alternative to Algorithm 1, Algorithm 2 admits more concise correctness proof 
than that of Algorithm 1. Algorithm 4 and Algorithm 5 require less memory than Algorithm 3. 
However, in contrast to Algorithms 1, 2 and 3, the local estimates at non-faulty agents converge in 
neither Algorithm 4 nor Algorithm 5. 

4.1 Algorithm 1 

Algorithm 1 pseudo-code for agent j is presented below. 


Algorithm 1 for agent j: 

Step 1 : Perform Byzantine broadcast of local cost function 3 hj(x) to all the agents using any 
Byzantine broadcast algorithm, such as [11]. 

In step 1, agent j should receive from each agent i £ V its cost function hi(x). For non-faulty 
agent i £ J\f, hi(x) will be an admissible function ( admissible is defined in Section 1). If a faulty 
agent k £ T does not correctly perform Byzantine broadcast of its cost function, or broadcasts 
an inadmissible cost function, then hereafter assume h^{x) to be a default admissible cost func¬ 
tion that is known to all agents. 


3 In this step, each agent j broadcasts a complete description of its cost function to other agents. 
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Step 2 : The multiset of admissible functions obtained in Step 1 is (hi(x), h 2 (x), • • • ,h n (x)}. For 
each x £ R, define multisets A(x), B(x), C(x) below, where h[(x) denotes the gradient of func¬ 
tion hi(-) at x. 


A ( x ) = {i : h'i (x) > 0}, 
B (x) = {i : h[ (x) < 0}, 
C{x) = {i : (x) = 0}. 


If there exists xGK such that 


mm 

F\\ F\C.A{x) and |-Fi|</ 


Y k 'i W + 


max 


i&A(x)—Fi 


F2 : F2CB(x) and |F2|</ 


Y k 'i ^ = 


i&B(x)—F2 


( 1 ) 


then deterministically choose output x to be any one x value that satisfies (1); 
otherwise, choose output x =_L. 


We will prove that Algorithm 1 solves Problem 3 with parameters f 3 = 2 ( | a/| -/) an< ^ T = l-^"l — /• 

For the multiset (hi(x),/i 2 (x), • • • , h n (x)} of n admissible cost functions gathered in Step 1 
of Algorithm 1, define F (x) and G (x) as follows, where A{x), B(x) and C(x) are as defined in 
Algorithm 1. 


and 


F (x) = min h' (x) 

Fv F 1 CA{x) and |iy|</ f-' 

iEA(x)—Fi 


G (x) = max h[ (x). 

F 2 : F 2 CB(x) and \F 2 \<f jrf „ 
iEB(x)-F2 


Proposition 1. F (x) and G (x) are both non-decreasing functions of x £t. 


The proof of Proposition 1 is presented in Appendix D. 


Proposition 2. Both F (x) and G (x) are continuous functions of x £l. 


The proof of Proposition 2 is presented in Appendix E. 


Lemma 1. Algorithm 1 returns x € R when n> 3/ (i.e., it does not return F). 

Proof. If there exists x G R that satisfies equation (1) in Algorithm 1, then the algorithm will 
not return _L. Thus, to prove this lemma, it suffices to show that there exists x € M that satisfies 
equation (1). Consider the multiset of admissible functions {hi(x), h 2 (x), • • • ,h n (x)} obtained by 
a non-faulty agent in Step 1 of Algorithm 1. Define X,; = arg rriin xG iu hj (x). Let rnaxXj and min Xj 
denote the largest and smallest values in Xj, respectively. Sort the above n functions hjfx) in an 
increasing order of their max Xj values, breaking ties arbitrarily. Let *o denote the / + 1-th agent in 
this sorted order (i.e., iq has the / + 1-th smallest value in the above sorted order). Similarly, sort 
the functions hjfx) in an decreasing order of minXj values, breaking ties arbitrarily. Let jo denote 
the / + 1 -th agent in this sorted order (i.e., jo has the / + 1 -th largest value in the above sorted 
order). Define function H{-) as 


H (x) = F (x) + G (x). 





6 


Consider x\ E Aj 0 and X 2 E Xj 0 . Then, by the definition of i o, jo, T(-) and G(-), we have 

H(x i) =F(xi) + G(xi) = 0 + G(xi) < 0 , 

and 

H (x 2 ) = T (x 2 ) + G (x 2 ) = F (x 2 ) + 0 > 0 

If H(x 1 ) = 0 or H (x 2 ) = 0, then 37 or x 2 , respectively, satisfy equation (1), proving the lemma. 
(Note that H(-) = F(-) + G(-), and the definition of F(-) and G(-) implies that, if H{xi) = 0 then 
Xi satisfies equation ( 1 ). 

Let us now consider the case when H(x 1 ) < 0 and Lf(x 2 ) > 0. By Propositions 1 and 2, we 
have that H (•) is non-decreasing and continuous. Then it follows that x\ < x 2 , and there exists 
x E [xi,x 2 ] such that H (x) = 0, i.e., x satisfies equation (1), proving the lemma. 

□ 


The next two theorems prove that Algorithm 1 can solve Problem 3 for 7 = |A/j — /, proving 
that the bound on 7 stated in Theorem 2 is tight for certain values of (5 (as stated in the theorem 
below). 

Theorem 3. When n > 3/, Algorithm 1 solves Problem 3 with f3 = 2 (W\^f) an ^ 7 = 1-^1 — f ■ 

Proof. By Lemma 1, we know that Algorithm 1 returns a value in M. Let x be the output of 
Algorithm 1 for the set of functions {h\(x), h 2 (x), • • • , h n (x)} gathered in Step 1 of the algorithm. 
Consider Ff C A(x) and Ff C B(x), with |Ff| < / and \Ff \ < /, that minimize X)»eA(x)-Fi ^ (®)> 
and maximize Yli&B(x)-F 2 (®)> respectively (as per equation ( 1 )). 

Recall that V = {l,...,n}. Sort the elements in the multiset {h[(x ),..., h' n (x)} in a non¬ 
increasing order, breaking ties in such a way that the elements corresponding to the agents in Tj* 
are among the first / elements in the sorted order and the elements corresponding to the agents 
in Ff are among the last / elements in the sorted order. Such a sorted order is well-defined since 
| Ff\ < f and \Ff\ < f. Let F\ C V be the agents corresponding to the first / elements in the sorted 
order, and let F 2 C V be the agents corresponding to the last / elements in the sorted order. Note 
that F* C F\ and Ff C F 2 . Since A (x), B (x) and C (x) form a partition of V, we have 

h 'i (®) = K (®) + h 'i (*) 

ieV-F*-F* i£C(x) i£A(x)UB(x)-F*-F* 

- 0 + k 'i (*) 

i£A(x)UB{S)-F*-F* 

= 0 + 0 = 0 ( 2 ) 

Equality (a) follows by definition of C (x), and equality ( b) is true because x satisfies equation ( 1 ). 
Denote 7 Z* = V — F\ — F 2 . Next we show that 

k i ^ = °- ( 3 ) 

F* 

If |A(x)| > /, by definition of F*, it holds that |F*| = /. Thus, F\ = Tj*. Consequently, we have 

h 'i{x) = ^2h'i(x) = 0. 
ieFi-Fj* *G 0 
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If |j4(cc)| < /, by definition of F* and F\. and the fact that F* C F\ , it follows that F* = A(x), 
and h[(x) < 0 for each % € F\ — F* = F\ — A(x ) ^ 0. In addition, if there exists i G F\ — F* such 
that h[{x) < 0, then by definition of F \, we have h'-{x) < 0 for each j € V — F\. So we get 

0 = Y h i (*) by ( 2 ) 

* 6 V-F*-F* 

= ^ h- (x) + ^ h' t (x) since F* C F 

ieV-Fi-F* i&Fx-F* 

< ^ h'i (x) since h! i (x) < 0 for each i G F\ — F* 
iGV-Fi-F 2 * 

< 0 since h[ (x) < 0 for each i € V — F \, 

proving a contradiction. Thus, there does not exist i £ F\ — F* such that h[(x) < 0, i.e., h'^x) = 0 
for each i € F\ — F*. Consequently, we have 

Y = Y 0 = °- 

i&Fi-F* i eFi-F* 


Hence, regardless of the size of |H(x)|, the following is always true. 

£ K(x) = 0 . ( 4 ) 

ieFi-F* 


Similarly, we can show that 


Y = °- ( 5 ) 

i£F 2 -F* 

Therefore, we have 

0 = Y h 'i (*) by ( 2 ) 

ieV-F*-F* 

= Yj K (f) + Y1 h'i (%) + ^ (%) since F* C F\ and F 2 * C F\ 

i£V—Fi—F2 i£Fi—F£ i€F2—F£ 

= Yj K ( f ) + Yj h'i (®) + Yj ^ (®) by definition of F* 

* 677 * ieFi-Fj* i£F 2 -F* 

= ^ h[ (x) + 0 + 0 by (4) and (5) 

* 677 * 

= X] h 'i (®) ’ 

* 677 * 

proving equation (3). 

Let Fi C Fi — T and F 2 C F 2 — F such that 

|Fi| = /-0+|F*nF| and |F 2 | = / - 0 + |F* n F| 


( 6 ) 
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Since |.F| = (f> < /, \F±\ = f = and 7 Z* U F\ U F2 = V, it holds that 

\Fi - F\ >/ — (/)+ \TZ* n F\ and \F 2 - F\ > f - </> + \TV n F\. 

Thus, F\ and F2 are well-defined. 

We now show that 

Y K (®) > ° and Y h 'i (*) - °- ( ? ) 

ieFi iGF 2 


Suppose K ( x 0 < 0) then there exists io G Ti C Fi — J 7 such that h' io (x) < 0. Since agents 

in F\ have the / largest values (including ties) in the set {h\ (x ),..., h' n (x)}, then /i'(x) < 0 for each 
i G 1Z*, contradicting the fact that (3) holds. Analogously, it can be shown that Yi£F 2 ( x ) — 0- 
In addition, we observe that 

Y h 'i ( X ) ^ Y h i (*) - Y h 'i (*) ■ ( 8 ) 

i&F 2 i£K*nF iGFi 

To see this, consider three possibilities: (i) YieTl* cfK ( x ) = (ii) YieU*nT^i ( x ) > and (iii) 
Yi£Ti*nF K ( x ) < _ 

First consider the case when Yien*rF K ( x ) = 0- Due to (7) and the case assumption, it holds 
that 

Y h i ( x ) ^ 0 = Y h i(F) = °<Y h i (*) ’ 

ieF 2 ie7?.*nJ' ieTi 

which is (8). 

Now consider the case when Yien* cfK(%) > 0. Since Yi£iz*nF M ( x ) > 0> it follows that 
TV n J 7 yt 0, and there exists k G TV fl J 7 such that h' k (x) > 0. This implies that hi (x) > 0 for 
each i E Fi. Let /j, = min ig ^ /;,( (x ). Note that /j, > 0. By definition of F \. it follows that 

K(x) < fi< hj(x), 

for each i G TV and j G Fj. Thus, we obtain 

Y Y h = (\K*nF\)v<(\Fi\)»=Y»^Y h iW- (9) 

i£'JZ*nJ r ie7l*rF i£Fi i£Fi 

Due to (7) and the assumption that Yi^n* n.FM( x ) > 0; we get 

Y h i (*) - ° < Y h i (*) ^Y h i (®) ’ 

i£F 2 ieTi*nT i eFi 

proving relation (8). 

Similarly, we can show the case when YieF*nF^i ( x ) < 0. 

Since the relation in (8) holds, there exists 0 < £ < 1 such that 


Y h i ^ = ^ 

ieH*r\F 



+ ( 1-0 



( 10 ) 
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Thus, we have 

0 = X K 0*0 from (3) 
i&Tl* 

X h i ( s ) + 2 M (®) 




i£fc*nF 


X h 'i ( X ) + C I X h 'i ( x ) ) + (! - C) I X h 'i ( X ) 

\ieFi ) \ieF 2 


Thus x is an optimum of function 


using ( 10 ) 


X hi (*) + c IX hi ( x ) + c 1 - o X hi ( x ) 

i&n*-F VieTi / V*eF 2 

Since constant scaling does not change optima, it follows that x is an optimum of function 


1 

|ft*-J1 + C|A| +(1-01^21 


X hi ( x )+c X hi ( x ) + ( 1 _ o X hi ( a 

K i&n*-F ieFi iGF 2 


( 11 ) 


Since |7£*| = n — 2 / and |Fi| = / — <f> + |7£* D J 7 ! = F 2 |, we have 

\TV- J-| + C|Fi| + (1-C)|F 2 | = |ft*-J-| + |Fi| since \F X \ = \F 2 \ 

= \TV - F\ + / - 0 + | 72 .* n F\ 

= \n*\ - \n* nF\ + f - cj) + \K* n F\ 

= \Tl*\ + f-<f> = n-2f + f-<l> = n-<i>-f = lfif\-f. 


We know that either £ > 1 or 1 — C > | > by symmetry, without loss of generality, assume C > \ • In 
addition, we know 


(TV - F) U Fi| = \TV - F\ + |.Fi| 

= \n*\ - \n* n f\ + f - <j) + \n* n f\ = |AA| - /. 


Recall that TV U F\ U F 2 = V. Thus, in function (11), which is a weighted sum of |A/"| local cost 
functions corresponding to agents in AT = V — F, at least \J\f\ — / local cost functions corresponding 
to i € (TV — F) U F\ have weights that are lower bounded by 2 (W\^f) • 

Similarly, when 1 — C > \-> we have at least \J\f\ — f cost functions corresponding to * £ 
(TV — F) U F 2 have weight lower bounded by 2 (|a/|-/) • 

□ 


Algorithm 1 has the following alternative performance guarantee. The relative strength of The¬ 
orem 3 and Theorem 4 depends on the value of n. For instance, when n > 4/ + 1, it holds that 

n ^ 2 m-fy 

Theorem 4. When n > 3/ + 1, Algorithm 1 solves Problem 3 with f3 = — and 7 = |A/"| — /. 
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Proof. From the proof of Theorem 3, we know that under Algorithm 1, (7) holds, i.e., 

E h i^ - 0 and E - °- 

iG-Fi iGF 2 

Then there exists 0 < f < 1 such that 


o = C(E h ] + (! - o IE ^(*) 

ViG-Fi / \iGF 2 


( 12 ) 


Note that either C:>^ or l — Our proof focuses on the scenario when £ > ^ - the scenario 

when 1 — £ > ^ can be proved analogously. 

Recall from (3) that 

0 = E h 'i@) = E ^(®) + E h '^- 

i&Tl* i&l*rT 

We now consider two cases: (i) X)ig vsrT K 0*0 = 0’ and (h) YlieTZ*r\T K 0*0 7^ 0- 
Case (i): 0*0 = *-*• I R this case, we have 

0 = E K 0*0 from (3) 

iGTl* 

= E h 'i ^ + E h i ^ 

i£R.*-T 


El h[ (x) + 0 due to the assumption in case (i) 
ie TZ*-T 

E h i (®) ■ 


(13) 


ieTZ*-T 

Multiplying both sides of (13) by (, we get 

0 = C 0 = C ( E 

ViG^-T 7 / 


C E h i ) + 0 

ViG^-T 7 / 

c( E K (*)) + c|E^^] + (!-o[E^(*) 

ViG^-T 7 / \jgFi / \iGF 2 


using (12) 


Thus x is an optimum of function 


C E hi (®) ) + C E hi (®) + ( X “ 0 E hi ^ 

ViG7e*-F / \iGFi ) \iGF 2 
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Since constant scaling does not change optima, it follows that x is an optimum of function 


1 

(\K* - T\ + (\F!\ + (1 - () \F 2 \ 


C ^2 h d x ) + C'22jh i (x) + (1 - c )'22 h i (a 




iGFi 


ieF 2 


Since ( > | > 1 — C, \TZ*\ = n — 2/ and |T\| = / — 0 + |7£* fl F\ = li^l, we get 

Q\n* -F\ + C\h\ + (1 - 0 \F 2 \ < C\K* -F\ + ClFil + Cl^l since C > \ > 1 - C 

= C (\R* -F\ + |Fi| + |F 2 |) 

= c (|7e*| - |tv n t\ + 2 / - 20 + 2\iz* n J 7 !) 

= C ( n — 2 / + 2 / — 20 + \iz* n J 7 !) 

< C (n - 2/ + 2/ - 20 + 0) since |ft* n J 7 ! < |= 0 

= C O - 0) = Cl-A/1 <Cn. 


(14) 


In addition, we know 

| (1Z* - F) UFi| = \1Z* -F\ + |Fi| 

= \n*\ - \iz* n f\ + / - 0 + \n* n f\ 

= n- 2 / + /- 0 = n- 0 - /=|A/'|-/. 

Recall that 1Z* U Fi U F 2 = V. Thus, in function (14), which is a weighted sum of |A/"| local cost 
functions corresponding to agents in J\f = V — F, at least \J\f\ — / local cost functions corresponding 
to i € (1Z* — F) U Fi have weights that are lower bounded by K 

Similarly, when 1 — ( > we have at least |A/"| — / cost functions corresponding to * £ 

(1Z* — F) U F 2 have weight lower bounded by 

Case (ii): X)ie7J*oF M (®) 7^ 0 * By symmetry, without loss of generality, assume that of M (t) > 

0. Then TZ* fl F ^ 0 and there exists j G 1Z* fl F such that h'- (x) > 0. This implies that F* = F\. 

By ( 8 ), we get _ _ 

0 < h '^ - 

iG'IZ*C\J r i£Fi 

Thus, there exists 0 < Ci < 1 such that 

h iffl = Ci () ( 15 ) 

ten* OF \ieFi ) 


By equation (3), we have 


0 = h 'i( x )+ 

i£K*-T i£TZ*HF 


= h i( x ) + Cl 

iGTZ*-F 



by (15). 


(16) 
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Multiplying both sides of (16) by (, we get 

o = c o = c ( + Ct h i(*) 

ieFi 

= c [ X] h J(*) + Ci ) + 0 

\i,GTZ*-T ieFj. / 

= c( J] /i i(*) + ciX) /i i(®)) +( 1_ o 

\ieTZ*-T ieFj. / \ieFi / \ieF 2 

Thus x is an optimum of function 


by (12) 


C I + Ci X/ ^(f> I + C I hi ( x )) + C 1 ~ 0 ( X/ /i?; ( x ') 

\i&n*-F ieFi / VieD / \ieF 2 

Define X = C (jF* — F| + £i|Fi|^ + £|Fi| + (1 — C) 1 -^21. Since constant scaling does not change 
optima, we know that x is an optimum of function 


1 

X 


cj ^2 hi ^ + ^2 hi ^ I + £ hi (®) + c 1 ~ o hi (®) I • 


(17) 


pie'll*-? jeFi / ieFi *eF 2 

Since F* n F / 0, it holds that |Fi | = / — </> + |F* fl F| > |F* n F| > 0, i.e., F\ / 0. Define 

min h[(x) = fi. 
jeF x 


By definition of fi, it holds that [i < h[(x) for i & F\. Also, fi > 0 because h[(x) > 0 for i G F± C 
F] = Fj* C A(x). In addition, by the construction of F*, we have h[ (x) < // for each z e F* D F. 
Then from (15) 


Clip’ll = 
< 


E 


h' 

i&H* nF n i 


Eigjy K (■* 
Eie^njF 


-I*i| 


Ejgiy A 4 


IFll 


|F*nF| 

IFll 


IFll 


|F*nF|. 


(18) 


Then 


X = c(|f*-f| + Ci|Fi|) +C|E| + (1-01^21 

< C(|F*-F| + |F*nF|) + C|F 1 | + (l-C)|F 2 | by ( 18 ) 
= Cm + dhl + Cl^l since C > ^ > 1 - C 

= C (|F*| + |F!| + |F 2 |) 

= ( (n — 2f + 2f — 2(f) + 2|F* fl F|) 

< C ( n ~ 2/ + 2/ — 2(f) + 2(f>) since \1Z* n F| < |F| = (f> 

= C n - 
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In addition, we know 

| (11* - F) U Fi\ = \n* - F\ + |Fi| 

= \n*\ - \R* nJ 7 ! + |Fi| 

= n - 2 / - \n* n f\ + / - $ + \n* n f\ 

= n-4>- f = \N\- f. 

— F) U F\ are 

> l 

□ 

Recall that H (•) = F (•) + G(-). We will use the result below later to prove correctness of 
Algorithm 3. Let Cov (•) be the convex hull of a given set. 

Theorem 5. For given J\f and T, there exists a convex and differentiable function H (•) defined 
over any finite interval [c, d] L> Cov (U^jg-Xi) such that the derivative function of H(-) is H (•), 
i.e., H'(x) = H (x) for each x £ [c, d] where Cov (U i^jyXf) C [c,d]. 

Proof. We prove the existence of H (•) by construction. Specifically we show that function H (■) is 
integrable over Cov (U i^jg-Xf). 

By definition of admissible functions, Xi = arg niin xe iR hjfx) is nonempty and compact (closed 
and bounded). Cov (Uis the convex hull spanned by the union of Xfs for all i £ JV. Thus 
Cov (Ci£jyXi) is convex and compact. In addition, by Propositions 1 and 2, we know that function 
H (•) = F (•) + G (•) is non-decreasing and continuous. 

As stated in the theorem, Cov [Ci^Xf) C [c,d]. Let H (-)r c rf] restriction of function 

H (•) to the closed interval [c, d] . Then we know that H (-)j c ^ is Riemann integrable over the closed 
interval [c, d]. 

For x £ [c, d], define H (x) by 

H (x) = J H (t) dt. 

Since H (•) is continuous on [c, d], we know H(-) is differentiable and 

H'(x) = H(x), 

for all x £ (c, d) [20] . 

In addition, by the fact that a scalar differentiable function is convex on an interval if and only 
if its derivative is non-decreasing on that interval, we know that H (•) is convex. 

□ 


Thus, in function (17), at least |A7| — / local cost functions corresponding to i £ (JZ* 
assigned with weights that are lower bounded by 7. Similar result holds when 1 — C 
Cases (i) and (ii) together prove the theorem. 


It is easy to see that the function H(-), defined in Theorem 5, is nL-Lipschitz continuous on 
any finite interval [c, d] L Cov (U i^Xf). 

Remark 1. The correctness of Algorithm 1 implies that H (x) = 0 and x £ Cov (U i^jyXi), where the 
latter claim follows from Proposition 5, proved in Appendix A. Essentially, Algorithm 1 outputs an 
optimum of the following constrained convex optimization problem, where Cov (Ujg^Aj) C [c,d\: 

min H(x) (19) 

s.t. x £ [c, d\. 
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4.2 Algorithm 2 

Algorithm 2 is an alternative to Algorithm 1. This construction admits more concise proofs. The 
Step 1 in Algorithm 2 is identical to that in Algorithm 1. What distinguishes Algorithm 2 from 
Algorithm 1 is the decision rule described in Step 2. 


Algorithm 2 for agent j: 

Step 1 : Perform Byzantine broadcast of local cost function hj(x) to all the agents using any 
Byzantine broadcast algorithm, such as [11]. 

In step 1, agent j should receive from each agent i £ V its cost function hjfx). For non-faulty 
agent i £ J\f, hi(x) will be an admissible function ( admissible is defined in Section 1). If a faulty 
agent k £ J- does not correctly perform Byzantine broadcast of its cost function, or broadcasts 
an inadmissible cost function, then hereafter assume hk(x) to be a default admissible cost func¬ 
tion that is known to all agents. 

Step 2 : The multiset of admissible functions obtained in Step 1 is {hi(x), h, 2 (x), ■ ■ ■ ,h n (x)} and 
the multiset of the derivatives of these functions is { h((x ), h' 2 (x), • • ■ , h' n (x)}. 

For 1 < K < n, define 

qk{x) = K th largest value (including ties) in the multiset {/^(x), h' 2 (x), • • • , h' n (x)}, (20) 

for each x £ M. If there exists x £ M such that 


n-f 

E 9k(x) = 0 (21) 

K=f+1 

then deterministically choose output x to be any one x value that satisfies (21); 
otherwise, choose output x =X. 


Similar to Algorithm 1, we can also show that Algorithm 2 solves Problem 3 with parameters 
(3 = 2 ( | a/ j-f) an d 7 = |-A/] — /. Our correctness proof is based on the following fact. 

Proposition 3. For each 1 < K < n, the function gx(x) defined in (20) is a continuous non¬ 
decreasing function. 

The proof of Proposition 3 is presented in Appendix F. 

Lemma 2. Algorithm 2 returns x £ M when n > 3/ (i.e., it does not return Aj. 

The proof of Lemma 2 is similar to the proof of Lemma 1. We present it in Appendix G. 

The next two theorems prove that Algorithm 2 can solve Problem 3 for 7 = |7V| — /. 

Theorem 6. When n > 3/, Algorithm 2 solves Problem 3 with /3 = wnTWfj an ^ 7 = 1-^1 — f ■ 

Theorem 7. When n > 3 f, Algorithm 2 solves Problem 3 with /3 = j- and 7 = |A/"| — /. 

The proofs of Theorems 6 and 7 are presented in Appendix H. 
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4.3 Algorithm 3 

Both Algorithm 1 and Algorithm 2 may be impractical because they require entire cost functions to 
be exchanged between the agents. Unlike Algorithms 1 and 2, the iterative Algorithm 3 presented 
below does not require the agents to exchange their local cost functions in their entirety. Instead, 
the agents exchange gradients of their local cost functions in each iteration. This algorithm, derived 
from the gradient-based methods for convex optimization, is more suitable for practical purposes. 

For convergence of our gradient-based algorithm (Algorithm 3), we now impose the additional 
restriction that the admissible functions also be L-Lipschitz continuous. This restriction is assumed 
in the rest of this section, even if it is not stated explicitly again. 

In Algorithm 3, each agent j computes a state variable Xj[t\ in the t-th iteration, t > 1, 
as elaborated below. We assume that the entire sequence Xj[t] is saved by agent j. Due to this 
requirement, Algorithm 3 requires additional memory, rather than keeping the minimal state Xi\t\. 
In the next subsection, we reduce this state maintenance requirement. 


Algorithm 3 for agent j: 

Initialization Step (i): Choose Vj € Xj = argmin xg R hj(x). 

Initialization Step (ii): Perform exact Byzantine consensus with Vj as the input of agent j to 
the consensus algorithm - any exact Byzantine consensus algorithm [13] may be used for this 
purpose 4 . Set Xj[0] to the output of the above consensus algorithm. 

Iteration t > 1: Step 1: Compute h'- ( Xj[t. — 1]), and perform Byzantine broadcast of h!- ( Xj[t — 1]) 
to all the agents, using any Byzantine broadcast algorithm, such as [11]. 

In step 1, agent j should receive a gradient from each agent i £ V - let us denote the gradi¬ 
ent received from agent i in iteration t as g t [t — 1], Agent j keeps a record of the sequence 
(t,Xj[t]). Agent j also keeps a record of the sequence (t,gi[t — 1]) for each agent i. Each 
received gradient is checked for admissibility as follows, using the above record. 

• If no gradient is, in fact, received from agent i in iteration t via a Byzantine broadcast 
from i, then the gradient g t [t — 1] for agent i is deemed inadmissible. 

• If there exists an iteration 1 < to < t such that at least one of the following conditions 
is true, then the gradient received from agent i is deemed inadmissible. 

1. Xj[t 0 - 1] < Xj[t - 1] and g k [t 0 - 1} > g k [t - 1] 

2. Xj[t 0 - 1] > x k [t - 1] and g k [t 0 - 1] < g k [t - 1] 

3- \g k [t — 1]| > L 

If the gradient received from any agent i is deemed inadmissible, then it must be the case 
that agent i is faulty. In that case, agent i is isolated (i.e., removed from the system). This 

4 For instance, consider the following algorithm: Each agent Byzantine broadcasts its Vj ; Collect n such values from 
other agents; Drop the smallest / values and the largest /. The output or consensus is the average of the remaining 
n — 2/ values. 
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reduces the total number of agents n by 1 , and the maximum number of faulty agent / 
is also reduced by 1. Algorithm 3 is restarted (from Step 1) using the new parameters n 
and /. 5 The gradients received from any non-faulty agent i £ M will never be found to be 
inadmissible. 

Setp 2: Due to the restart mechanism above, the algorithm progresses to Step 2 only when all 
the received gradients are deemed admissible. Let IZ[t — 1] denote the multiset of admissible 
gradients {g\[t — 1], g 2 [t — 1], • • • ,g n [t — 1]}, obtained in Step 1 of t-th iteration. If there are 
more than / positive gradients in lZ[t — 1], then remove / largest gradients from lZ[t — 1]; 
otherwise, remove all positive gradients from 7Z. Further, if there are more than / negative 
gradients in IZ[t — 1], then remove / smallest gradients from IZ; otherwise, remove all neg¬ 
ative gradients from lZ[t — 1]. Let 7Z*[t — 1] be the set of agents corresponding to all the 
remaining gradients. 

Step 3 : Let {A[t ]}“ 0 be a sequence of diminishing (non-increasing and A[f] 0) stepsizes 
chosen beforehand such that \[t] > 0 for each t, A[t] = oo and A 2 [t] < °°- 6 

Compute Xj [t] as 


Xj[t\ = Xj[t - 1\ - X[t - 1\ ^2 

«e 7?.* [i—i] 


We prove that Algorithm 3 solves Problem 3 with 7 = \Af\ — / . In fact, we will see later that Al¬ 
gorithm 3 is essentially the gradient method for the constrained convex optimization problem (19). 
Then the remaining correctness proof follows the standard convergence analysis of such algorithms 
[3,18]. 

Lemma 3. The consensus value obtained at the end of step 2 of Algorithm 3 is contained in 
Cov (U ie _^Xi), i.e., x[0] £ Cov (U ie jg-Xi). 

The above lemma follows trivially from validity condition imposed on a correct Byzantine consensus 
algorithm. Thus the proof is omitted here. 

Let [a,b] = Cov {fJ^j^Xf). Recall that all functions hi{x )'s are L-Lipschitz continuous, i.e., 

I Ki x )\ < L - 

Proposition 4. In Algorithm 3, Xi[t] = Xj[t\ for all i,j £ M and for all t. In addition, Xi[t\ £ 
[o — nA[0]L, b + nA[0]L] for any t. 

Proof. Recall that admissible cost functions are L-Lipschitz. We prove this proposition by induc¬ 
tion. By the correctness of a Byzantine consensus algorithm, x r [0] = xj[ 0] for all z, j £ A I. Assume 
that Xi[t — 1] = Xj [t — 1] for all i.j £ A I for some t > 0. At iteration t, non-faulty agents i and j 
receive identical set of gradients IZ[t — 1} in Step 1 of iteration t. This, along with the assumption 
that Xi [t — 1 ] = Xj [t — 1 ] implies that Xi [f] = Xj [f]. 


5 

6 


It is also possible to continue executing the algorithm further, but for brevity, we take the approach of eliminating 
the faulty agent, and restarting. 

For instance, the stepsizes {A[t] = satisfy the aforementioned condition, since A[t] = D-j- < = A[t + 1 ], 

£"o lyr = 00 and £~ 0 77^ < 00. 
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Now, by induction, we show that Xi[t] E [a — nA[0]L, 6 + nA[0]L] for any i E AT and for any t > 0. 
By the validity condition of a correct consensus algorithm, it holds that re*[0] E Cov (Ci^jyXi) = 
[a, b\. Assume that Xi[t — 1] E [a — nA[0]L, b + nA[0]L] for some t > 0. We will show that 

Xi[t] E [a — nA[0]L, b + nA[0]L] (22) 

is also true. We know [a — nA[0]L, b + nA[0]L] = [a — nA[0]L, a) U [a, b] U (b,b + nA[0]L]. 

When Xi[t — 1] E [a, b] = Cov (Ujg^W), then 

_ (a) 

Xi[t] = Xi[t — 1] — X[t — 1] ^2 9i[t — 1] < Xi[t — 1] + nA[0]L < b + nA[0]L, 

i&TZ* [1-1] 

where (a) follows due to the admissibility test (3) in Step 1 above and because A[0] > X[t] for t > 0. 
Similarly, we have 

Xi[t\ > a — nA[0]L. 

Thus when Xi[t — 1] E [a, b\, then Xi[t] E [a — nA[0]L, b + nA[0]L]. 

When X{[t — 1] E ( 6 , b + nA[0]L], all non-faulty gradients are positive since Xi[t — 1] > b = 
maxj£j\f max Xj. Thus, at most / admissible gradients are non-positive. By the code in Algorithm 
3, all the negative admissible gradients will be removed. Thus rjj [t — 1] > 0 for each j E 1Z*[t — 1]. 

Xi[t\ = Xi[t — 1] — A [t — 1] ^2 9i[t ~~ 1] ^ x i[t ~ 1] ^ ^ + nA[0 ]L 

i&n* [i-i] 

In addition, 

Xi[t] = Xi[t — 1] — A[f — 1] ^2 9i[t ~ 1] ^ x i[t — 1] — n\[0]L > a — nA[0]L. 

i&TZ* [1-1] 

because Xi[t — 1] > b > a and g *[t — 1] < L for g *[t — 1] E 7 Z*[t — 1]. 

Thus when Xi[t — 1] E ( 6 , b + nA[0]L], Xi[t] E [a — nA[0]L,6 + nA[0]L]. Similarly, we can show 
that when Xi[t — 1] E [a — nA[0]L, a), Xi[t] E [a — nA[0]L, b + nA[0]T]. 

Therefore, we conclude that Xi[t.] E [a — nA[0]L, b + nA[0]T] and the induction is complete. □ 


Henceforth, with an abuse of notation, we drop the subscript j of Xj[t] for each j and t. Similarly, 
we drop the time index [0] of A [0]. 

Theorem 8 . For any i E T, let {gi[t — 1])}^ be the sequence of admissible gradients generated by 
Algorithm 3, where gi[t — 1] is supposed to be the gradient at x[t — 1]. Then there exists a function 
g(x) defined over [c,d], which contains points a — nXL and b + nXL as interior points, such that (i) 
g'(x[t — 1]) = gi[t — 1], and (ii) g(x) is convex, L-Lipschitz, and differentiable. 

Proof. When i E J, let g + = sup {gi[t — 1])}^ 1 , g~ = inf {gi[t — 1])}^, and consider the piece- 
wise linear function with {(.x[t — 1 ],gi[t — 1])}^T 1 U {(c, g ~), ( d , g + )} as corners, denoted by g{x). 
It is easy to see that function g(x) is continuous. Recall that [c, d] is a closed interval. Thus g(x) is 
Riemann integrable over [c, d]. Choose g(x) to be the integral function of g(x), i.e., g(x) = fc 9 (t)dt. 

Since g(t) is continuous over [c, d], it holds that g(x) is differentiable and g'{x) = g(x) for 
x E (c, d). In addition, when [g t [t — 1]}^! are admissible, by definition of admissible gradients (in 
Algorithm 3), function g(x) is non-decreasing. Then function g(x) is convex over [c, d\. L-Lipschitz 
property is also ensured by admissibility of the gradients. □ 
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It is easy to see that there exists an admissible function g(-) such that the restriction of g(-) to 
[c, d\ equals g(-), i.e., <?|[ Cj d](-) = d{')- Thus, Theorem 8 states that the record of gradients saved in 
Algorithm 3 effectively forces each Byzantine agent (that is not already isolated) to behave as if it 
is non-faulty with a local cost function g(x) that is admissible. Therefore, hereafter we can assume 
that all agents, including faulty agents, behave correctly and consistently with an admissible local 
cost function. 

Recall that we defined [a, b] = Cov (U i^jgXi). By Proposition 4, we know that the local estimate 
of each non-faulty agent i is trapped within the closed interval [a — nA[0]L,6 + nA[ 0 ]L] for all 
iterations, i.e., Xi[t] £ [a — n\[0\L,b + nA[0]L] for all i £ N and all t. Therefore, Algorithm 3 is 
essentially trying to find an (exact or approximate) optimum of the following constrained convex 
optimization problem, which is a variant of (19): 

min H (x ) 

s.t. x £ [a — nLA[0], b + nTA[0]]. 

It should be easy to see that the total gradient 9^ ~ -*■] use( l hi computing Xj[t] is 

identical to F{xj [t — 1]) + G(xj [i — 1]), which is the gradient of H(-) at Xj [t — 1]. In other words, the 
agents are distributedly using the gradient method for convex optimization of global cost function 
H(-), which is convex and continuous. 

Following the convergence analysis of the gradient method in Theorem 3.2.2 in [18] and Theorem 
41 in [17], we can show that the limit of {x[t ]}^ 0 exists and lim^oo x[t] = x*, where x* is an 
optimum of function H(-). 

Remark 2. The gradient-trimming mechanism in Step 2 of Algorithm 3 can be replaced by the 
following trimming rule: “Remove / largest gradients from TZ[t — 1] and remove / smallest gradients 
from lZ[t — 1].” This trimming rule leaves n — 2/ admissible gradients. The modified algorithm is 
then a distributed version of Algorithm 2 and its correctness can be shown analogously to that of 
Algorithm 3. 

4.4 Algorithm 4: Non-Interleaved Iterative Algorithm I 

Recall that in Algorithm 3, the entire sequence Xj[t\ is saved by agent j. In the following two 
subsections, we relax this memory requirement by proposing two non-interleaved algorithms, in 
particular, Algorithms 4 and 5. Unlike Algorithms 1, 2, 3 and 5, the performance guarantee of 
Algorithm 4 is independent of \J\f\. In general, the performance of Algorithm 4 is weaker than that 
of Algorithms 1, 2, 3 and 5. 


Algorithm 4 for agent j: 

Initialization Step (i): Choose x?[0] £ Xj = arginin^R hj(x). 

Initialization Step (ii): Perform exact Byzantine consensus with Xj [0] as the input of agent j. 
Set x [0] to the output of the above consensus algorithm. 

Iteration t > 1: Step 1: Compute h'j ( Xj[t — 1]), and perform Byzantine broadcast of hj ( Xj[t — 1]) 
to all the agents, using any Byzantine broadcast algorithm, such as [11]. 
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In step 1, agent j should receive a gradient from each agent i £ V - let us denote the gradient 
received from agent i in iteration t as (ji[t — l ]. 7 If no gradient is, in fact, received from agent 
i in iteration t via a Byzantine broadcast from i, then it must be the case that agent i is 
faulty. In that case, agent i is isolated (i.e., removed from the system ). 8 This reduces the 
total number of agents n by 1 , and the maximum number of faulty agent / is also reduced 
by 1. Algorithm 4 is restarted (from Step 1) using the new parameters n and /. 

Setp 2: Due to the restart mechanism above, the algorithm progresses to Step 2 only when 
agent j has received gradients from all other nodes. Let lZ[t — 1] denote the multiset of the 
gradients {gi[t — 1], g 2 [t — 1], ■ ■ • ,g n [t — 1]}, obtained in Step 1 of f-th iteration. Drop the 
smallest / values and the largest /. Set g[t — 1] to the average of the remaining n — 2 / values. 

Setp 3 : Let {A[t]}JT 0 be a sequence of diminishing (non-increasing and A[t] —>■ 0 ) stepsizes 
chosen beforehand such that \[t] > 0 for each t, -M^] = 00 an d YltZi < °°- 

Compute x[t ] as 


x[t] = x[t — 1] — A [t — 1 ]g[t — 1]. 


(23) 


Since x[0] is the consensus value of the input Xj[ 0] for each j £ M , and g[0] is the consensus 
value of the non-faulty gradient h'j(x[ 0]), for each j £ J\f, the update function (23) is well-defined 
for t = 1. By an inductive argument, we can show that the update (23) is well-defined for all t. 

Let C be the collection of functions defined as follows: 

C = { p(x) : p(x) = ^2 ®ihi{x), Mi £ J\f, on > 0, 
ieJV 

on = 1 , and 

i£Af 

7 ))-"- 2/} (24) 

i£Af x v J ’' 

Each p(x) £ C is called a valid function. Note that the function ^ ( x ) *= For ease of 

future reference, we let p(x) = W\^ i&M hi{x). Define Y = U p ( x ) e cargmin p(x). 

Lemma 4. Y is a convex set. 

Proof. Let X\,X 2 £ Y such that x± / X 2 - By definition of Y, there exist valid functions p\(x) = 
YlieAf a ihi(x) £ C and P 2 {x) = Yli^Af Pihi(x) £ C such that x\ £ argmin pi(x) and X 2 £ 
argmin p 2 (®)> respectively. Note that it is possible that Pi(-) = p 2 (-), and that Pi(-) = p(-) for 
i = 1 or i = 2 . 

Given 0 < a < 1, let x a = ax\ + (1 — a)x 2 - We consider two cases: 

(i) x a £ argmin p\(x) U argmin P 2 (x) U argmin p(x), and 

(ii) x a £ argmin p\(x) U argmin p 2 (x) U argmin p{x) 


7 For each i € Af, gi[t — 1] = h'i(x[t — 1]). 

8 Alternatively, the gradient value can be replaced by some default value (say 0). 
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Case ( i): x a E argmin pi(;r)U argmin 0*0 U argmin p{x) When x a E argmin pi (a;) U argmin p 2 (x)U 

argmin p(x), by definition of Y, we have 

x a E argmin p\ (x) U argmin P 2 (x ) U argmin p(x) C Y. 

Thus, x a E Y. 

Case (ii): x a ^ argmin p\(x) U argmin p 2 (x) U argmin p(x) By symmetry, WLOG, assume that 
x\ < X 2 ■ By definition of x a , it holds that x\ < x a < x 2 - By assumption of case (ii), it must 
be that x a > max (argminpi(x)) and x a < min (argminpi(x)), which imply that p\ (x a ) > 0 and 
p' 2 {x a ) < 0 . 

There are two possibilities for p'(x a ) (the gradient of p(x a )): fi{x a ) < 0 or p\x a ) > 0. Note 
that p'(x a ) 7 ^ 0 because x a argmin p(x). 


When p'(x a ) < 0, there exists 0 < C < 1 such that 

C Pi(xa) + (1 - C) p'ixa) = 0. 

By definition of p\ (x) and p(x ), we have 


0 = C p'l(Xa) + (1 - C) = C ^X a i h i( x a)j + (1 “ C) X h iM 


X^c+a-o^-) K{2 


Thus, x a is an optimum of function 


X (W + (1 - hi(x). 


i&M 

Let X be the collection of indices defined by 

X = { i : i E Af, and ai( + (! — £) 


1 


> 


1 


}• 


|AA| 2 (n-f) 

Next we show that \X\ > n — 2/. Let X\ be the collection of indices defined by 

1 


X\ = { i : i E Af, and a* > 


2 (n-f) 


}• 


(25) 


Since pi(x) E C, then \Xy > n — 2/. In addition, since n > 3/, |AA| < 2(n — /). 9 Then, for each 
j E X \, we have 


«iC + (i-C)^o >Ct 1 


\M\ “"2 (n-f) 


+ (1-C)±->C- 1 


l-A/'l 2(n - /) 


+ (1-C) 


1 


1 


2 (n-f) 2(n — /) ’ 


i.e., j E X. Thus, X\ C X. 

Since \X\ | > n — 2/, we have |X| > n — 2/. So function (25) is a valid function. Thus, x a E Y. 

Similarly, we can show that the above result holds when p 1 (x a ) > 0. 

Therefore, set Y is convex. 

□ 


In fact, n > 2/ suffices for this. 
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Lemma 5. For each iteration t, there exists a valid function p{x) £ C such that g[t — 1] = p'{x). 

Proof. Let lZ*[t — 1] denote the set of nodes from whom the remaining n — 2f values were received 
in iteration t, and let us denote by C[t — 1] and S[t — 1] the set of nodes from whom the largest 
/ values and the smallest / values were received in iteration t. Due to the fact that each value is 
transmitted using Byzantine broadcast, lZ*[t — 1], £[7 — 1] and S[t — 1] do not depend on j, for 
j £ Af. Thus, lZ*[t — 1], £[7 — 1] and S[t — 1] are well-defined. 


By definition of g[t — 1], we have 

1 ] 


1 


n — 2f 


Y 9j[t- !]+ Y 


K j€7L*[t-l]-T 


j£ TZ* [t— lJnJ 7 


1 


n - 2 / 


Y h'^xf - 1]) + Y 9j[t ~ 1] I since Mi £ Af, gi[t - 1 ] 




j£ tz* [t— ljnJ 7 


K(x[t- i]) 
(26) 


We consider two cases: (i) lZ*[t — 1] n T = 0 and (ii) lZ*[t — 1] fl T ^ 0. 


Case (i); 7 Z*[t — 1] fl T = 0. When lZ*[t — 1] fl J 7 = 0, it holds that TZ*[t — 1] C Af. Define 

p( x ) = FFy £jerc*[t-i] M*)- We have 


p\x[t - 1 ]) 


-o7 Y ~ l \) = 9[t ~ A- 

n — zt z — J , J 

° A * \+ 11 


In addition, in function p(x), n — 2/ component functions, corresponding to functions in lZ*[t — 1] 
have weights n ^f > o(2-f) • Thus p(x) £ C. This proves Lemma 5 for lZ*[t — 1] n T = 0. 


Case (ii): K*[t- 1] FT + 0. Let \K*[t-l}FT\ = 6 , let £*[7-1] C C[t - 1] n Af and S*[t - 1] C 
S[t — 1] PI Af such that | C*[t — 1] | = | S*[t — 1] j =6. Since |-7-"| < / and \lZ*[t — 1] fl fF\ = 9, it follows 
that | £[7 — 1] fl fF\ < f — 6 and \S[t — 1] D fF\ < f — 0. We have 

I c[t -1] n Af\ = | c[t -1]| - | c[t -i\nr\>f-(f-e) = e, 

and 

I s[t -1] n Af\ = |5[t -1]| - | s[t -i]nT\ >f-(f-0) = 0. 

Thus £*[7 — 1] and S*[t — 1] are well-defined. 

By definition of £*[7 — 1], S*[t, — 1] and 7 Z*[t - 1] fl 7, it follows that for each i £ £*[7 — 1], 
j £ <S*[7 — 1] and k £ lZ*[t — 1] D T 

K(x[t - 1]) > g k [t - 1] > h'j{x[t - 1]). 

Since |£*[7 — 1]| = |<S*[7 — 1]| = \lZ*[t — 1] fl fF\ = 0, we have 

Y K{x[t - 1]) > Y 9k [7 - 1] > Y h 'j( x $ ~ 1 1)- 

iec*[t- 1 ] ken^t-ijrT je<s*[i-i] 
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Thus, there exists 0 < £ < 1 such that 


X 9k[t-l] = (l Y b'iW* ~ !]) ] + ( X -0 ( Y /z i( x [ t “ 1 ])]- 

* [i—lJriJF - \ie£*[t-i] / \j'e5*[t-i] / 


ken* [t- 
So (26) can be rewritten as 

1 


3it ~ 1] = n-2f 
Define q(x) as 

q(x) = 


n - 2 / 


( Y tij(x[t- i]) + c Y /i i( x [ t_1 ]) + ( 1 “0 Y } Y x ^ _ !])) 

\jen*[t-i]-T iec*[t- 1] jes*[t-i] / 

I Y h j( x [i-i D+c XI ^(^[i-i]) + (i-o X h ji x \P — !])) ' 

ie£*[t-i] jes*[t-i] / 


By definition of q{x), it holds that g[t — 1] = q'(x[t — 1]). Next we show q{x) £ C. 
Since | C*[t — 1]| = | S*[t — 1] | = \lZ*[t — 1] n P\ = 9, it holds that 


1 


n-2f 


E i+c E 1 + (‘-<) E i 


Vjerc*[t-i]-.F 


ie£*[t-i] 




1 


n-2f 


1 

n-2f 


E 1 + « E i + (!-o E 1 


E E >' 


jen* [t-^nn 


1 


n-2f 


E i = ‘- 


ien*[t-i]nr ) jen*[t-i] 

Thus function q{x) is a convex combination of hfix) for i £ N . 

When £ > 4, it holds that all component functions in (7 Z*[t — 1] — T) U C*[t — 1] have weights 
bounded below by — 2 (n-f) • I n addition, 

| (n*[t - 1 ] - p) u c*[t - 1 ]| = | n*[t - 1 ] - p\ + 1 c*[t - 1 ]| 

= \n*[t-i}\- \n*[t- 1 ] np\ + \c*[t- 1 ]| 

= n-2f-0 + 9 = n-2f. 


Thus, q(x) £ C. This completes the proof. □ 

Definition 1. Given a point x and a nonempty set C, the distance of point x to set C, denoted by 
Dist(x , C) is defined by 

Dist(x, C) = inf llx — y|L . 
y eC 

Theorem 9. When n > 3 f, Algorithm 4 solves Problem 3 with j3 = 2 {n-f) am ^ 7 = n ~ 2/. 

Proof. Let {x[t]}JE 1 be the sequence of estimates generated by Algorithm 4. To show this theorem, 
it is enough to show the limit of Dist(x[t], Y) exists and 

lim Dist(x[t],Y) = 0. 

t—> OO 

We say that an element x[t] is a resilient point if conditions in one of the following items hold 
true: 
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* x[t — 1] € Y and x[t] Y, 

* x[t — 1] > sup Y and x[t] < inf Y, 

* x[f — 1] < inf Y and x[t] > supT. 

We consider two cases: (i) there are infinity many resilient points in {x[t]}^ :1 , and (ii) there are 
finitely many resilient points in 

Case (i); There are infinity many resilient points in {x[t]}“ 1 . Let be the maximal 

subsequence of {x[t]}^ :1 such that each x[ti] is a resilient point. 

Recall that in the update function (23), 

:c[f] = x[t — 1] — A [t — 1 }g[t — 1], 

For each resilient point x [L], it holds that 

Dist(x[ti],Y) < |A[f; - 1 }g[ti - 1]| < A[t; - 1 }L. 

Thus, 

lirnsup Dist(x[ti], Y) < limsup A[tj — 1]L = ( limsup A [fi — 1] ) L = 0. 

i—> oo i — yoo \ i—> oo / 

Equality (a) follows from the fact that the stepsize A[t] is diminishing, i.e., lim^oo X[t] = 0. 

By definition, for each j such that t t < j < t*+i, the element x\j] is not a resilient point. Thus, 
x\j\ > sup Y for each U < j < L+i, or x[j] < inf Y for each t t < j < ti+\ or x[j] € Y for each 
ti < j < |_i. By the update function (23), we have 

Dist(x\j],Y) = max {0, Dist{x[j - 1], Y) - A [j - 1] \g[j - 1] |} 

= max{0, Dist(x\j — 1},Y)} since \g[j — 1]| > 0 
< Dist(x[j — 1], Y) since Dist(x[j — 1], Y) > 0 

for each ti < j < and each i, and all the above three possible scenarios. Consequently, we have 

Dist(x[j],Y) < Dist(x[ti],Y). 


Thus, 

limsup Dist(x[j],Y) < limsup Dist(x[ti], Y) = 0. 

j—> oo i—yoo 

Since Dist(x,Y) > 0 for all x , we have liminf j->oo Dist(x\j],Y) > 0. Then, 

limsup Dist(x[j], Y) < 0 < liminf Dist(x[j], Y). 

j—^OO J 

On the other hand, by definition of liminf and limsup, we have 

liminf Dist(x[j], Y) < limsup Dist(x[j], Y). 

j —* 00 j—^OO 

Thus, 

liminf Dist(x\j],Y ) = limsup Dist(x[j], Y) = 0. 

j—> OO 

Therefore, the limit of Dist(x\j],Y) exists and 

lim Dist(x\j),Y) = 0. 
j-*> o 
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Case (ii); There are finitey many resilient points in {xff]}^. Let to be the largest index such 
that a: [to] is a resilient point. If there exists t' > to such that x\t'\ £ Y, then x[t\ £Y for all t >t', 
i.e., Dist(x[t], Y) = 0, for all t > t'. Thus, the limit of Dist(x[t\,Y) exists and 

lim Dist(x[t], Y) = 0. 


Now we consider the scenario where x[t] ^ Y for all t > to- Since to is the largest index such 
that x[to ] is a resilient point, then either x[t] < inf Y for all t > to or x[t] > supY for all t > to- 
By symmetry, it is enough to consider the case when x[t] < inf Y for all t > to- By update func¬ 
tion (23), and the definition of to, it can be seen that the subsequence {x[f]}£L 4o is an increasing 
sequence. In addition, we know x[t] < inf Y £ M for all t > to- By Monotone Convergence Theorem, 
the limit of {x[f]}£L to exists and lim^oo x[t] = x* < inf Y. 

If lim^oo x[t] = x* = inf Y , then lim^oo Dist(x[t], Y) = 0. 

Now we consider the case when lim^oo x[t] = x* < inf Y. Since x* < inf Y, there exists e > 0 
such that x* = infT — e. Let p = sup^.)^//^*). We show next that p < 0 provided that 
x* = inf Y — e. 

For each p(-) £ C, p'(x*) < 0. Then, p = supp^^p^x*) < 0. 

Let h' 71 (x*), • • • , (x*) be a non-increasing order of /(/- (x*), for j £ Af. Dehne q(x) as follows, 

/ _ 9 f — 1 \ 1 n ~ 2 f 

“ {X) = { 1 - \n - /) ) K(X) + W^T) ,5 h ‘ i(X) ' 

It can be easily seen that q(-) £ C is a valid function and 

sup p'(x*) = q'{x*). 
p(-)ec 

Thus, if x* = inf Y — e for some e > 0, then p = q'{x*) < 0. 

From the update function (23), we have 

x[t T m + 1] = x[t + m\ — A[t + m\g[t + m\ 

t-\-m 

= x M - x ^s[j] 

j=t 0 

(a) (+H) 

> x[t 0 ] - Y 
j=to 

Inequality (a) is true because (1) the gradient of a convex function is non-decreasing, (2) x[t] < x* 
for each t > to, and (3) p = sup p (.\ eC p'(x*). 

Let m —> oo, we have 
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t+m 

lim x[t + m + 1 ] > x[to\ — lim ) A [j]p 

m—>oo m—>oo 

j=to 

( t+m \ 

iini A b1 P 
j=to J 

= x[to] +00 = 00. 

On the other hand, we know linim^ooxft + m + 1] < x* € M. A contradiction is proved. Thus, 
x* = inf Y. Consequently, lim^oo Dist(x[t], Y) = 0. 

This completes the proof. 

□ 


4.5 Algorithm 5: Non-Interleaved Iterative Algorithm II 

As commented before, the performance guarantee of Algorithm 4, in contrast to Algorithms 1, 2 , 
and 3, is independently of |7V"|. Algorithm 5, described below, admits the analysis between the 
tradeoff of /3 and 7 in terms of \J\f\. 


Algorithm 5 for agent j: 


Initialization Step (i): Choose 37 [0] € Xj = arg rnin xe R hj (x). 

Initialization Step (ii): Perform exact Byzantine consensus with Xj [0] as the input of agent j. 

Set x[0] to the output of the above consensus algorithm. 

Iteration t > 1: Step 1: Compute h'- ( Xj[t — 1]), and perform Byzantine broadcast of h'- ( Xj[t — 1]) 
to all the agents, using any Byzantine broadcast algorithm, such as [11]. 

In step 1, agent j should receive a gradient from each agent i G V - let us denote the gradient 
received from agent i in iteration t as gi[t — l]. 10 If no gradient is, in fact, received from 
agent i in iteration t via a Byzantine broadcast from i, then it must be the case that agent i 
is faulty. In that case, agent i is isolated (i.e., removed from the system). * 11 This reduces the 
total number of agents n by 1, and the maximum number of faulty agent / is also reduced 
by 1. Algorithm 5 is restarted (from Step 1) using the new parameters n and /. 

Setp 2: Due to the restart mechanism above, the algorithm progresses to Step 2 only when 
agent j has received gradients from all other nodes. Let lZ[t — 1] denote the multiset of the 
gradients {gi[t — 1], < 72 ^ — 1], ■ ■ ■ ,g n [t — 1]}, obtained in Step 1 of t -th iteration. Drop the 
smallest / values and the largest /. Denote the largest and smallest gradients among the 
remaining values by g[t — 1] and g[t — 1], respectively. Set g[t — 1] = | (g[t — 1] + g[t — 1]). 

10 For each i € M, gi[t — 1] = h'i(x[t — 1]). 

11 Alternatively, the gradient value can be replaced by some default value (say 0). 
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Setp 3: Let {A[t ]}“ 0 be a sequence of diminishing (non-increasing and X[t] —> 0) stepsizes 
chosen beforehand such that X[t] > 0 for each t, A[f] = oo and A 2 [i] < oo. 

Compute x[t] as 


x[t] = x[t — 1] — X[t — 1 ]g[t — 1], 


(27) 


Since x[0] is the consensus value of the input ®j[0] for each j £ M , and g[0] is the consensus 
value of the non-faulty gradient hj(x[ 0]), for each j £ J\f, the update function (27) is well-defined 
for t = 1. By an inductive argument, we can show that the update (27) is well-defined for all t. 

Let C be the collection of functions defined as follows: 

C = { p(x) : p(x) = ^2 otihi(x), Vi £ AT, ai > 0 , 
ieM 

ai = 1 , and 

ie M 

l P8) 

Each p(x) £ C is called a valid function. Note that the function ^ hi( x ) S C. For ease of 

future reference, we let p(x) = r^r YlieAf ^*( x )- Define Y = U p ^ e( jargmin p{x). 

Lemma 6. Y is a convex set. 

The proof of Lemma 6 is similar to the proof of Lemma 4, and is presented in Appendix I. 
Lemma 7. For each iteration t, there exists a valid function p(x) £ C such that g[t— 1] = p'(x[t— 1]). 
The proof of Lemma 7 is presented in Appendix J. 

Theorem 10. When n > 3 f, Algorithm 5 solves Problem 3 with /3 = 2 ( | a/| -/) an ^ V = 1-^1 — /• 
The proof of Theorem 10 is similar to the proof of Theorem 9, and is omitted. 

5 Algorithm 6: Suboptimal Algorithm 

Algorithms 1, 2, 3, 4 and 5 all use Byzantine broadcast as subroutines, which may be costly. Unlike 
these algorithms, the iterative Algorithm 6 presented below does not require the agents to exchange 
their local cost functions and gradients. Instead, in Algorithm 6 , each agent optimizes its local cost 
function locally and exchanges the local optima, using an arbitrary Byzantine consensus algorithm. 
In addition, the correctness proof of Algorithm 6 does not require each hj{-) to be differentiable. 
Thus Algorithm 6 also works for non-smooth functions. 

Algorithm 6 is not an optimal algorithm. Specifically, Algorithm 6 only solves Problem 3 with 
/3 = ^/T and 7 = — cj), instead of the optimal 7 * = |W| — / achieved by Algorithms 1 , 2 , 3 and 

5. That is, Algorithm 6 is a suboptimal Algorithm for Problem 3 with /3 = 2^- 
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Algorithm 6 for agent j: 

Step 1 : Choose Vj G Xj = argmin ;I . eR /i J (x). 

Step 2 : Send Vj to all agents, and receive messages from all agents. Agent j should receive a value 
from each agent i G V—let us denote the value received from agent i as w^. If no value is, in 
fact, received from agent i, then is set to be a predefined default value. 

Sort in a non-decreasing order, breaking tie arbitrarily, and set Xj [0] to be the median of 
this order, i.e., we choose Xj [ 0 ] to be the wij whose rank is [~|~|. 

Step 3 : Perform exact Byzantine consensus algorithm with Xj[ 0] as the input of agent j to the 
consensus algorithm. 

Set x to be the output of the above consensus algorithm, and output x. 


Theorem 11. When n > 3 f, Algorithm 6 solves Problem 3 with /3 = an d 7 = [fl — ( t ) - 

Proof. Let Wj denote the multiset obtained by agent j, i.e., Wj = [w\j,. .. ,w n j}. For each i£l, 
define Wj(x) and WJ (x) as follows. 

Wj (x) = {i : i G Af and > x}, 

Wf~(x) = {i : i G Af and < x}. 

Note that WJ (x) U Wj(x) = V for each iGl, and that Wij = for each i G Af. It should also be 
noted that Wj(x) and WJ(x) are not necessarily disjoint. 


For each j, since Xj[ 0] is chosen to be the median of the non-decreasing order over Wj, we have 

71 

\wj(xj[ti\)\ = |{?; : i G J\f and w tj > Xj[0]}| > [-] - 

Tl 77 / 

\Wj(xj[0})\ = |{?; : i G Af and w tj < Xj[0]}| > n - [-] - f + 1 > f-] - f. 

Let io G Af and jo € Af be the agents such that Xj o [0] < Xj [0] for each j € Af and x ]0 [0] > Xj[0] 
for each j G Af. Since x is the output of a correct exact consensus algorithm, by validity, we have 
Xi 0 [0] < x < Xj 0 [0] . Thus 

: i G Af, w iio < x io [0]} C {i : i G Af, w iio <x} = {i:i£Af,Vi<x} 

and 


{i : i G Af, Wij 0 > Xj 0 [0]} C {i : i G Af, Wij 0 > x} = {i : i G Af, Vi > x}. 


Consequently, we have 

|{* : i G Af, Vi < x}\ > |{i : i G Af, w iio 


\{i : i € Af, Vi > x}\ > \{i : i € Af, Wij 0 


< *io[°]}l = 

1 w,-< 



(29) 

> x*[0]}| = 

i«5' 

(*jo[°])I ^ 


(30) 


and 
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Recall that Vj € Xj = argmin xGffi /ij (x), then hi(x) > 0 for each i G {i : i £ Af,Vi < x}, and 
hi(x) < 0 for each i £ {i : i G AT, Vi > x}. Define A(x), B(x) and C(x ) as follows. 

A(x) = {i : i G AT, h[ (x) > 0 }, 

B(x) = {i : i € AT, /i((x) < 0 }, 

C(x) = {i : i G AT, /r'(x) = 0 }. 

We now consider two cases: (i) A(x) = 0 or B(x) = 0, and (ii) A(x) / 0 and B(x) / 0. 


Case (i): A(x) = 0 or B(x) = 0. If B(x) = 0, then h{{x) = 0 for each i G {i : i € Af, V{ > x}. 
Then x is an optimum of function 


|{i: 


1 

i e Af, Vi < x} | 


X M*)- 

j£{i'- iGAf, Vi<x} 


(31) 


As |{i : i £ AT, Vi < x}| < |W| and by (29), it holds that |{i : i £ Af, Vi < x}| > — <f>. Thus, in 

(31) at least \^~\ — 4> non-faulty functions are assigned coefficients bounded below by 


Similarly, we can show the case when A(x) = 0. 


Case (ii): A(x) / 0 and B{x) ^ 0. When A(x) ^ 0 and B{x) / 0, 

h((x) > 0 and /i((x) < 0 - 

i£A(x) i£B(x) 

Then there exists 0 < £ £ 1 such that 


0 = C X ^i( x ) + (! - 0 X] ^i(*) ) • 

\jeA(x) J \i£B(x) 

In addition, by definition of C(x), we have 


C X M(*) I + (! “ 0 I X h i( x ) + X h i( x ) = ° + X ^( x ) 

\i£A(x) ) \iGB(x) ) i€C(x) i£C(x) 

= 0 + 0 = 0 . 

Thus x is an optimum of 


X I C X hi (*) + “ 0 X ^( x ) + X m*) | ’ 

i€.A(x) i£B(x) i£C(x) 


where 


X = 


C|A(*)| + (l-C)|S(x)| + |C'(x)r 

Since 0 < £ < 1 , either £ > \ or 1 ~ C > WLOG, assume £ > We have 


( 32 ) 


£|A(x)| + (1 - 7 ) \B(x)\ + |C(x)| < \A(x)\ + \B(x)\ + |C(x)| = | A(x) U B(x) U C(x)| = \Af\. 
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In addition, since A(x) U £7(2;) {i : i £ J\f and Vi < x } and B(x) U £7(2?) D {i : i e AT and Vi > x}, 
by definition of x, we have \A{x) U £7(2;)| > |"f] — 4> and | B(x) U £7(2;) | > [§] — 4>. Then in (32), 
at least |"§] — 4> non-faulty functions are assigned with weights at least ^/-p Similar result holds 

when 1 — C > ^. 

Cases (i) and (ii) together prove the theorem. □ 

6 Extensions 

Many extensions of these results are possible. The results obtained in this technical report can be 
extended to the case when the functions are sub-differentiable, with slightly more involved analysis. 
This generalization will be presented in another technical report. 

We have also obtained a comparable set of results for the case when the cost functions are 
redundant in some manner (e.g., cost function of agent 3 may equal a convex combination of cost 
functions of agents 1 and 2), or the optimal sets of the local cost functions are guaranteed to overlap. 
These results will also be presented elsewhere. 

Finally, if the underlying communication channel is a broadcast channel (over which all trans¬ 
missions are received correctly and identically by all agents), then the results presented in this 
report can be proved for n > 2/ + 1. 

7 Summary 

In this paper, we introduce the problem of Byzantine fault-tolerant optimization, and obtain an 
impossibility result for the problem. The impossibility result provides an upper bound on the 
number of local cost function of non-faulty nodes that can non-trivially affect the output of a 
weighted optimization function. We also present algorithms that matches this upper bound. In 
addition, a low-complexity suboptimal algorithm is presented. 
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Appendices 


A Proposition 5 


Proposition 5 is used in proving the correctness of other results in the paper. 

Proposition 5. Let Oj > 0 for i £ AT and = 1- Consider admissible functions hi(x), 

i £ Af, with Xi = argmin xg R hjfx). Define X as 

A = argmin> atihAx). (33) 

X J 

i&M 


Then 


X C Cov (UigjvAj), 


(34) 


where Cov (Ujgjv'Aj) is the convex hull of set Ujg.vAj. 

Proof. By definition of admissible functions, X\ is nonempty for all i € A f. Then Cov (Ujg^A?.) A 0. 

If X = 0, then X C Cov (Ujg_/\f Aj) holds trivially. 

It remains to be shown that when X A 0, X C Cov (Ujg^Ai) is also true. We prove this by 
contradiction. 

Suppose that A / 0 and A <2 Cov (U^g^A?.)- Then there exists a value xo £ A that is 
not contained in Cov (Ujg^Aj). Recall that, by definition of admissible functions, A,; is compact 
(closed and bounded) for each i £ Af. Then Cov (Ujg^Aj) is both convex and compact. To simplify 
notation, let [a, b] = Cov (UjgjvfAj). In addition, YlieAf &ih[(xo) is the gradient of the function 
SigAf a ihi{x) at x = x 0 . 

As xq Cov (Uig^Aj), then either xq < a or xq > b. By definition, a is the smallest point at 
which there exists i £ J\f such that h[ (x’o) = 0. If xo < a, then hf (xo) < 0 for each i £ Af. Otherwise 
the minimality of a will be violated. In addition, since ati > 0 for each i £ Af and = then 

YlieAf a iK( x o) < 0- However, since xo £ A, by optimality of xo it must be that a iK.( x o) = 0. 

This leads to a contradiction. 

Similarly, we can derive a contradiction when xq > b. □ 


B Proof of Theorem 1 

Proof. Assume that / > 0. 

The proof of the theorem is by contradiction. 

Suppose that there exists a correct algorithm A that solves Problem 1. Define the cost functions 
of the n agents as follows. 

— hi(x) = (x + l) 2 , 

— h n (x) = (x — l) 2 , and 

— hjfx) = x 2 + i, where 2 < i < n — 1. 

Let A i be the optimal set of hi(x) for i £ V. That is, A* = argmax xe R/i*(x). It is easy to see that 
Ai = {—1}, X n = {1}, and for 2 < i < n — 1, Aj = {0}. We consider two executions wherein A 
produces different outputs, and show that there exists a non-faulty agent that cannot distinguish 
these two executions. 
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The identity of the faulty agents in these two executions are different. In both executions, the 
faulty nodes follow algorithm correctly with the above choice of cost functions. 

Execution 1 : In execution 1 , let A f = {1, • • • , n — 1} and T = {n}. Since A is a correct 
algorithm, by Proposition 5 it follows that the output of the algorithm must be in Cov Xj\ = 

[—1,0] for all agents i £ {1, ■ ■ ■ , n — 1} - note that Proposition 5 is stated and proved in Appendix 
A. 

Execution 2: In execution 2, let A f = {2, • • • , n} and T = {1}. Since A is a correct algorithm, 
by Proposition 5 it follows that, in this case, the output of the algorithm must be in Cov " = 2 Ay^ = 
[ 0 , 1 ] for all agents i 6 {2, ■ ■ ■ , n} 

The agents in {2, • • • , n — 1} cannot distinguish between the above two executions, and hence 
must produce identical output in both cases. That is, their output must be 0 since [—1, 0] D [0,1] = 
{0}. (When / > 0, n > 3/ + 1 = 4. Thus, the set {2, • • • , n — 1} is non-empty.) 

On the other hand, it is easy to see that ^(0) ^ 0 and Ya =2 ^(0) 7 ^ 0, contradicting the 

hypothesis that 0 is an optimal solution for either execution-note that h[{x) is the derivative of 
function hi(-) at x for each 1 < i < n. This contradicts the assumption that A is correct and the 
proof is complete. □ 

C Proof of Theorem 2 

Proof. Recall that we assume n > 3/ + 1 and that we denote |.F| = 4>. Let hi(x ),... ,h n (x) be 
defined as follows. 

— hi(x ) = (x — i ) 2 , for 1 < i < f and n — (j) + l<i<n. 

In this case, the optimum for hi(x) is at x = i. 

— hi{x) = (x — a) 2 , for / + 1 < i < n — (f>, where a = / + 1 . 

In this case, the optimum for hi(x ) isatx = o = / + l. 

From a non-faulty agent j’s perspective, any subset of / agents may be faulty. Assume that, if 
agent k is faulty, then aside from choosing its cost function as specified above, agent k does not 
behave incorrectly. Thus, all agents follow any specified algorithm correctly. 

To show the impossibility claim of the theorem, consider any correct algorithm. 

Now, let us consider any non-faulty agent j where / + 1 < j < n — (f>. Consider two possible 
cases: 

Case 1: In this case, suppose that agents 1 through n — (j) are non-faulty, and agents n — + 1 

through n are faulty. For the local cost functions (specified above) for the non-faulty agents in 
this case, the optima are in the interval [1, a]. Then by Proposition 5, for the output x it must 
be true that x € [l,a]. (Recall that Proposition 5 is stated and proved in Appendix A.) 

Case 2: In Case 2, suppose that agents / + 1 through n are non-faulty, and agents 1 through / 
are faulty. For the local cost functions (specified above) for the non-faulty agents in this case, 
the optima are in the interval [a, n]. Then by Proposition 5, for the output x it must be true 
that x £ [a, n\. 

Since the non-faulty agent j does not know the actual number of faulty agents in the system, it 
cannot distinguish between the above two cases, it must choose identical output in both cases. 
Therefore, the output must be in [1, a] D [a, n]; that is, the output at non-faulty agent j must equal 
a = f + 1. Therefore, all non-faulty agents must output a = / + 1 in both cases. 
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Now suppose that Case 1 holds, i.e., agents n — <f + 1 through n are faulty. By the requirements 
of Problem 3, there exists a collection of weights a,’s such that x = a is an optimum of objective 


n—(j) 

y, otihi (x ), (35) 

i=l 


Thus, Yl'i=i a i K( a ) = 0) where h[{x) denotes the derivative of function hi(-) at x. 

Recall that a = / + 1. By construction of h\(x ),..., hn-^x), we know h[ (a) = 0 for / + 1 < 
i <n — <j) and h! i (a) > 0 for 1 < i < f. Thus 



= ya l h' t (a). 


1=1 

For 1 < i < /, since h((a) > 0 and a* > 0 it holds that otih'^a) > 0, where equality holds if and 
only if a.i = 0. Thus, 'Yj%=\ a *h'(o) = 0 implies that aih'^a) = 0 for 1 < i < f. Then a* = 0 for 
!<*</■ 

Since there are |7V| non-faulty agents (1 through n — 4>), and weight a* = 0 for 1 < i < f, it 
follows that at most |vV| — / of the weights of the non-faulty agents in Case 1 are non-zero. 

Thus, regardless of the value of parameter (5 in Problem 3 (where f3 > 0), if 7 exceeds |A/"| — /, 
no algorithm can solve Problem 3. □ 


D Proof of Proposition 1 


Proof. We first show that F (x) is a non-decreasing function. 

Choose any x £ M, and choose any y > x. Let S y and S x be sets such that Yhi&A{y)-s y (u) 
and YlieA(x)-S x K are minimized, respectively. 

Since hi (•) is convex, h[ (•) is non-decreasing. By definition of A (•) we have A (x) C A(y), i.e., 
A (•) is non-decreasing. In addition, 0 < \A (•) | < n. Similarly, we can show that B(y) C B(x) and 
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0 < \B (•) | < n. 


F (y) - F(x) = Y h 'i ~ Y h 'i ( x ) 

i£A(y)—Sy i&A(x)—S x 

= Y h 'i ^ + Y h 'i (^ “ ( Y k i ^ + Y k 'i 

ieA(y)—S y —S x i£S x nA(y)—S y \i£A(x)—S x —S y ieS y nA(x)—S x 


Y h 'i fa) ~ Y h 'i ( x ) I +1 Y h i ( y ) - Y h 'i ( a 

i&A(y)—S v —S x i&A(x)—S x —S y ) \i£S x nA(y)—S y i&S y C\A(x)—S x 


> I Y h 'i(y)~ Y h i ( X ) I + I Y h i fa) - Y h 'i fa) 

\ieA(x)—Sy—S x i&A(x)—S x —S y ) \ieS x nA(y)—S y i£S y nA(x)—S x 


Y h 'i fa) ~ Y h 'i ( X ) I + Y h 'i fa) ~ Y h i 

i&A(x)—S y —S x ieA(x)—S x —S y ) \i£S x —S y i£S y nA(x)—S x 


> £ K fa) - Y h 'i ( a 

\i£S x -S y ieS v nA(x)-S x 


Inequality (a) follows from the fact that A{x) C A(y) and h'(y) > 0 for each i £ A(y)\ equality ( b ) 
is true since S x C A (x) C A (y); and inequality (c) holds because that h[ (•) is non-decreasing. 

Now consider two cases: (i) |<5 X | < / and (ii) \S X \ = f. 


Case (i): \S X \ < f. In this case, we have S x = A(x), and 


Y h 'i fa )~ Y h i( x )= Y h 'i fa) ~Y h 'i 

i(zS x —S y i€S y nA(x)—S x i£S x —S y i€0 

= Y h 'i fa) ~ 0 


i£.Sx Sy 


Case (ii): |5 X | = /. Because S x C A(x) C A(y), if \S X \ = /, we have \A(y) \ > /. Then, by 
definition of S y , it holds that S y = f . Now, 

|5 X - Sy\ = \s x -s x n Sy | = \s x \ - \s x n s y \ = f - |s x n s y \ 

= |5j,| - | s x n Sy\ = I Sy -s x n s y \ > |s y n A (x) -s x n s y \ > \s y nA (x) -s x \. 

Thus, | S x — S y \ > |jSj, D A (x) — S'j-I. 
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By definition of S x , for each i G S x —S y and j G S y r\A (x)—S x , at point x, we have h[(x) > h'-{x ), 
i.e., ti t (x) > max jeSynA ( x) _ Sx h'- (x). We have 


Y h 'i (y) 


ieS y r\A(x)—Sx 


, (a) 

K (x) > 

E 


Sy 

> 

E . 

i^Sx Sy 

Al 

E 


(x) - y h 'i ( x ) 

ieS y r\A(x)—Sx 


i£S y r\A(x)—S x 


max h'Ax) — > (a 


ieS; / n J 4(x)—s : 
> 0 , 


ie5 y nA(x)-S ; 


ieS y nA(a:)— S x 


(38) 


where (o) holds due to the fact that h[ (•) is non-decreasing and that y > x, and (6) holds because 
I'S'a; — 'S'yl > \S y D A (x) — 5 X | and for j G S y , h!-{x) > 0. 

Therefore, from (36), (37) and (38), we have that F(y ) — F[x) > 0 for y > x, i.e., F(-) is 
non-decreasing. 


G (•) is non-decreasing 

Now we show that G (•) is also non-decreasing. Choose any xGR, and choose any y > x. Let S y 
and S x be sets such that YlieB( y )-S y K (v) aR d YlieB(x)-s x h'i ( x ) are maximized, respectively. 


G(y) - G(x) = Y h 'i (v) ~ Y h ’i ( x ) 

i£B(y)—S y i&B(x)—S x 


Y ^ + Y k 'i 

i£B(y)—S y —Sx i(zS x r\B(y)—S y 


Y k 'i ^ + Y k i (*) 

i i£B(x)—Sx—S y i£S y nB(x)—Sx 


Y h i ^ ~ Y h 'i ( X ) + Y h 'i(y)~ Y h 'i ( a 


yi£B(y)—S y —Sx 


i&B(x)—Sx—Sy 


\i£S x rB(y)—S y 


ieS y nB(x)-S x 


(a) 

> 


(■ b) 
> 


= o + ( Y h i (y) ~ Y h 'i I • 

i(zS x r\B(y)—S y i&S y —Sx 


Y k 'i ^ ~ Y h 'i ^ 

\i£B(y)—Sy—S x ieB(x)—Sx—Sy 



Y h 'i (y) Y h 'i ( x ) 

ieS x nB(y)-S y ieS y nB(x)-S x 


E h ’d 


x — 


Y h i ( X ) + Y h i ^ ~ Y h 'i^ 


\i&B(x)—S v —Sx 


ieB(x)—Sx—S y 


yi£S x nB{y)-S y 


i£S y nB(x)—Sx 


(39) 


Inequality (a) holds due to the fact that h[ (•) is non-decreasing and x < y\ inequality (6) follows 
from the fact that B (y) C B (x) and h[ (x) < 0 for each i G B (x); and equality (c) is true because 
that S y C B (y) C B (x). 

Now consider two cases: (i) \S y \ < f and (ii) \S y \ = /. 
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Case (i): |5 y | < /. In this case, we have S y = B(y ), and 


E h 'i (y) - Y h 'i ^ = ^2 h i(y)~ Y h i (*) 

—Sy i£.Sy Sx i£0 i^Sy Sx 

= o - Y K ( x ) 

i£.Sy —Sx 

> 0 . 


(40) 


Case (ii): \S y \ = f. Because S y C B(y) C B(x), if \S y \ = /, we have \B{x)\ > f. Then, by 
definition of S x , it holds that |5*| = /. Now 


\Sy 


S x 


| Sy ~S X 11 Sy | = \Sy\ ~ \S X D Sy\ = f ~ \S X (~l Sy\ 

15*1 - | s x n s y \ = \s x -s x n s y \ >\s x nB (y) ~s x ns y 


15* n B (y) — Sy 


Thus (S'y - 5*1 > \S X n B (y) - S y \. 

By definition of S y , for each i £ S y — S x and j € S x fl B (y) — S y , at point y, we have /i'(y) < 
m[n j&S x nB( y )-S y tij (y). We have 


E h i ^ ~ Y h i ^ 1 ^ Y h 'i ^ ~ Y h 'i ^ 

i&S x r\B(y)—S y i£Sy—S x iGS x nB(y)—S y i£S y —S x 

> Y Kiy)- Y 

ieS x r\B(y)-Sy ieSy-S x J x ^ y 


( 6 ) 


> E K{y)~ Y 


min h'j (y) 


ieS x nB( y )-S v 

> 0 , 


ieS x nB(y)-S y 


jeS x nB(y)-S, 


(41) 


where (a) holds due to the fact that h[ (•) is non-decreasing and that y > x, and ( b ) holds because 
| S y - 5*1 > \S X rt B (y) - S y \ and for each j € B(y), h'-(y) < 0. 

Therefore, from (39), (40) and (41), we have G(y) — G(x) > 0 for y > x, i.e., G(-) is non¬ 
decreasing. 

□ 


E Proof of Proposition 2 

Proof. We first show that F(x) is continuous. We will use the non-decreasing nature of F(-) proved 
above in Proposition 1. 

Recall that each hi(x) is continuously differentiable, i.e., h'^x) is continuous. Then, for every 
e > 0 there exists a 5 > 0 such that for all x G (c — 5, c + 5) the following holds for all i £ J\f, 

Wi( x ) ~ h 'i(c)\ < e. (42) 

To show F{x) is continuous, we need to show that 

|x — c| < <5 =► \F(x) — F(c)\ < e. (43) 

Suppose \x — c| < 5 holds for some 6 > 0, then c — 8 < x < c + 5. Let S c+ $ and S c be the subsets 
of A(c + 5) and A(c), where |S' C+( 5| < / and |5 C | < /, such that YliGA(c+S)-S c+s K ( c F 8) an d 
^2ieA(c)-S c h[ (c) are minimized, respectively. Note that A(c) C A(c + (5). 
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We have 


(a) 

F(x) - F(c ) < F (c + 6) — F (c) 

= Y h i( c + S )~ Y h i ( c ) 

i&A(c+5)—S c+ s i£A(c)—S c 

= Y h i( c + s )+ Y h 

iGzA(c-\-8) —5 C _|_<5 — S c zG-A(c-|-(5)n5c—Sc+<5 


■i (c + 6) - ( Y h 'i ( C ) + Y h 'i ^ 

\i£A(c)—S c+ s—S c i£S c+ snA(c)—S c 


0&) 


Y K (c+ 5) + Y^ K ( c + 

zG-A(c+<5) —S , c _|_ 5 — S c zG5c — S c -\~s 


Y h 'i ( C ) + Y h 'i ^ 

\ieA(c)—S c+ s—S c i£S c+ snA(c)—S c 


Y k i ( C ) 

ieS c+6 r\A(c)-S c 


(c) 

< 


(d) 

< 


f Y h i(c+5)~ y K ( c )) + ( Y h 'i( c+5 ) 

yzG.A(c+(5) — S c +s~S c zGA(c)—5 C _|_5—5 C J \zG5c S c +§ 

Y h 'i ( C ) ) + ( Y h i ( C + 5 ) _ Y h 'i ^ ) 

5)-S c+6 -S c J \ieS c -S c+ s ieS c+ s-S c J 

Y h 'i ( C ) ) + ( Y h 'i ( C + 5 )~ Y h i ( C ) ] 

5)—S c +s—S c J yzG5 c _|_ ( 5— S c zG5 c _|_ ( 5—5 C J 


Y h 'i ( C + 5 ) - 

^zGA(c+<5)—5 c _|_ < 5 —Sc zGA(c+<5)— 


Y h i( c + s )~ 

^zGA(c+< 5)—5 C -j-«5 —Sc zGA(c+<5)— 


where (a) holds due to monotonicity of F {•); equality (6) is true since S c C A(c) C A{c + 5 ); 
inequality (c) follows from the fact that h[ (c) < 0 for each i £ A (c) and A(c) C A(c + d); and 
inequality (d) holds because, as shown next, 

Y h 'i ( c + 6 ) ^ ^ ( c + 5 ) ( 45 ) 

zG 5 c 5 c _|_5 zG 5 c _|_5 5 c 


Now, observing that |*S' C | < |£' c+ < 5 |, we get 


|Sc - ^c+ 5 | = |5 C - 5 C n S c+5 | = |5 C | - |5 C n S c+5 | < \S c+s \ - \s c n S C+(S | = |S c+5 - S c 


In addition, by definition of S c , for each i € S c — S c+ s and j E S c+ g — S c , h[ (c + 5) < h'- (c + 6). 
Then, 


Y h i( C+ <*) 

zG5c 5 c -|-(S 


< 


E 

zG5c 5 c -|-5 


min 

jeS c+ s~S c 


hj (c + <5) 


(a) 

< 


E 

zG 5 c -|-5 5 c 


min 

jeS c+ 5-Sc 


h'j (c + 5) 


< Y k i ( C + ^) ’ 

zG 5 c +<5 5 c 


where inequality (a) is true because \S C — 5 C _|_,5| < |S l c+( 5 — S c \ and minj G 5 c+5 _ 5 c /i'- (c + 5) > 0. This 
proves (45). 
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Then we have 

F(x)-F(c)<l X] K(c + S)~ J 2 ^(c) 

\ieA(e+<5) — S c+ s—S c ieA(c+S)—S c+ s—S c 

+ ( Y1 h i( c+6 )~ Y1 h 'i ( c )) due to ( 44 ) 

\i£S c+ s—Sc i£S c+s -S c ) 

- Y { h 'i( c + 6 ) ~ h i( c )) 

^G-A(c- 1 - 5 )—Sc 
(b) 

< |A(c + S)-S c |e 

< ne. 

Equality (a) follows because (A (c + <5) — S c+ $ — S c ) U (S c+ s — S c ) = A (c + 5) —S c and sets A (c + 5) — 
S c+ s — S c and S c+ s — S c are disjoint. Inequality (6) follows from (42). 

By an analogous argument, we can also show that for any x £ (c — <5, c + 5), 

F(x ) — F(c ) > —ne. 

For completeness, we present the proof as follows. 

Let S c _s and S c be the subsets of A (c — 5) and A (c), where |»S’ C _ ( 5 1 < / and \S C \ < /, such that 
Ei G A(c-5)-S c _ 6 K ( c - 6 ) and E* G A(c)-5 c K ( c ) are minimized, respectively. 


F(x) - F{c) > F(c -5)- F(c) 

= Y K( c ~ 5 )- h i ^ 


i€A(c-6)-S c -s 


ieA(c)—S c 


Y h 'i ( C - 5 ) + (c - <5) ) - I Y h 'i ( C ) + h 'i ( C ) 

\i£A(c—S)—S c -s—Sc i(zS c r\A(c—5)—S c -g ) \i£A(c)—S c -g—S c i(zS c -gnA(c)—S c 

Y h i ( c ~ s )~ h 'i ( c ) ] + 1 K{c- 8 )~ y k ( c ) 

yi£A(c— 6)—S c -g—S c i(zA(c)—S c -g—S c ) \igS c nA(c— 5)—S c -g i£S c -gnA(c)—S c 

> I Y h i( c ~ 5 )- h 'i ( c ) I + ( h i ( c - 5 ) - h 'i (■ c ) 

ieA(c)—S c -g—S c ieA(c)—S c -g—S c ) \i&S c —S c -g ieS c _inA(c)— S c 


{b) 1 Y h i( c ~ 6 )- h i ( c )) + ( h i( c ~ 5 )- h i ( c ) 

yi£A(c)—S c -g—S c ieA(c)—S c -g—S c ) \i&S c —S c -g i£S c -g—S c 

= {K (c - 5) - h'i (c)) + l Y tii(c-S)- Y ^( c ) 

i£A(c)—S c —s—S c \i£S c —S c —s i£S c —s—S c 

Inequality (a) follows from the fact that h[ (c — 5) <0 for each i £ A{c — 5) and A (c — 5) C A{c). 
Equality (6) is true because that S c _$ C A(c — 5) C A(c). Now, observing that |5 c _i| < |5 C |, we get 


\Sc - s^ s \ = \s c \ -1 s c n S c _s I > \s c _s\ - | 5 C n S c . s \ = | 5 C _ 5 - S c 


(46) 
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In addition, we have 


i£S 




i£S c —S c -s 


ieS c —S c -s 


> Y h 'i( c ~ d )- m K ( c ) 


i£S c —S c -s 


i&S c —S c -s 


Y i h i(c-6) -h'i(c)) . 


(47) 


ies c — S c _s 


Inequality (a) holds due to the fact that for each i G S c -g ~ S c , h'^c) < min j e g c s c _ 5 h'j (c). 
Inequality (6) follows from (46) and the fact that min jG 5 c _ 5 c _ iS (c) > 0. 


Thus 


F(x) - F(c) > Y [K ( c ~ 6 )~ K (c)) + Y h i( c ~ 6 )- Y h i ( c ) 

ieA(c)—S c -s—S c \i£S c —S c -s i£S c -s—Sc ) 

> Y ( h i h i ( C )) + ( c ~ 5 ) “ K ( c )) from ( 4T ) 

i£A(c)—S c _ s —S c ieS c —S c -s 

= Y i. h i ( C “ 5 ) “ K ( c )) 


i&A(c)-S c _s 
> — ne from (42) 


Then we have, for any eo = ne > 0, there exists 5 > 0 such that 

\x — c\ < 5 =>• \F(x) — F(c)\ < eo- 

Therefore, F (•) is continuous. 

Continuity of G (•) 

To show G(-) is continuous, we need to show that 


x — c\ < 5 =>■ \G(x) — G(c)| < e. 


(48) 


Suppose \x — c| < 6 holds for some 6 > 0, then c — 5 < x < c + 5. Let S c+ s and S c be the subsets 
of B(c + 5) and B(c), where \S c+ g\ < / and \S C \ < /, such that YlieB(c+8)-S +(5 K ( c + $) and 
Yli£B(c)-s c h'i ( c ) are maximized, respectively. 
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We have 


G(x) — G{c ) < G (c + 5) — G (c) because G(-) is non-decreasing 

= X /<; (c + <5) - X K ( c ) 


i£B(c+S)-S c+ s 


ieB(c)-S c 


X (c + (5) + X K ( c + $) “ | X h'i ( c ) + X h'i ( c ) 

i€B(c+S)—S c+ s—S c i£S c nB(c+5)-S c+s \i£B(c)—S c+ s—S c i&S c+ sr\B(c)—S c 

Y h' (c + 5) - X h' t (c) j + I X h- (c + <5) - X h\ (c) 
^ie_B(c+(5)—S c +i— S c iGB(c)—S c+ s—S c J \ieS c nS(c+5)— S c+ s i€S c+s nB(c)-Sc 

< [ XI h'i (c + 5) - X h'i ( c ) ) + ( X h'i (c + -5) - X K (c) 

\i&B(c)-S c+ s-S c i&B(c)-S c+ s-S c ) \i£S c -S c+ s i£S c+ s~S c 

= X ( h 'i ( C + 6 ) ~ K ( C )) + ( X h'^G + 8)- X h 'i ( C ) ) > 

ieB(c)—S c+ s—S c \i£S 0 —S c+ s ieS c+s -Sc J 

where inequality (a) follows from the fact that h! i (c + J) >0 for each % ^ B (c + 5) and B (c) I) 

B (c + 5) D S c+ s- Next we show 


X h'i (c) > X h'i (c). 

i£.S c - 1_5 Sc i£S c ‘S'c+5 


(49) 


For each i £ S c , it holds that h'(c) > maxj e 5 c _ 5 c+j h'(c). Now, observing that |jS' c | > |»S' C _ ) _ < 5 1, we get 

|5 C - s c+(5 | = \s c \ - |s c n s c+(5 | > |s c+a | - | s c n s c+5 | = |s c+(5 - s c |. 

Thus, because max je 5 c _ 5 c+5 h'-{c) < 0, 

X K( c ) > X max h'Ac)> X max h'Jc) > X K( c )- 
ie5 c+5 -S c ieS c+ i-S c c+d ieS c -S c+ s + i&S c -S c+s 

So we have 


G(x) - G(c) < X (c + 5)- h[ (c)) + X h i( c + s )~ X h i ( c ) 

i£B(c)—S c+ s—S c \i£S c —S c+ s ieS c -S c+s 

= X {h'i (C + 5)- h'i (c)) 

ieB(c)-S c+ s 

< ne due to (42) 


Now we show that for any x G (c— 5, c+5), it follows that G(x) — G(c) > —ne. Let S c _g and S c be 
the subsets of B (c — 8) and B (c), where \S c s\ < f and \S C \ < /, such that Yli£B{c-5)-S c _ 6 K ( c — ^) 
and X)ie_B(c)—s c K ( c ) are maximized, respectively. Note that B(c — 8) D B(c) D <5 C . 
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G(x) — G{c ) > G(c — 8) — G(c) because G(-) is non-decreasing 

= Y h i( c ~ 6 )- Y h 'i ^ 


i€B(c-6)-S c -s 


i£B{c)-S c 


Y hi(c-6)+ Y h i ( C - <*) - Y h 'i ^ + Y h 'i ^ 

\i£B(c—S)—S c -s—Sc i£S c r\B(c—S)—S c -s ) \i&B(c)—S c -s—S c i£S c -s^B(c)—S c 

Y h i( c ~ s )+ Y h 'i ( c _ 5 )) _ ( Y h * ( c ) + Y h i ^ 

\i&B(c—6)—S c -s—Sc i£S c —S c -s J \i£B(c)—S c -5—S c i£S c -s<^B(c)—S c 

Y K (c - 8) - J] ^ (c) ] + [ Y h 'i(c-8)~ Y h i ( c ) 

yi£B(c—S)—S c -s—S c i£B(c)—S c -s—S c ) \i&S c —S c -s i&S c -gnB(c)—S c 

> I Y K( c ~ 6 )- Y h i ( C ) I + I Y h 'i ( C ~ 6 )~ Y h 'i ( c ) 

i£B{c-5)-S c _ s -Sc i&B(c-5)-S c _ s -Sc ) \i&S c -S c _ s i&S c _ 5 -S c 

= Y ( h i ( C “ S ) ~ K ( C )) + ( Y h i( c ~ s )- Y ^ ( c )) ’ 

i£B(c—S)—S c —s—S c \ieS c —S c -s i€S c -s-Sc / 

where inequality (o) follows because B(c) C B{c — 8) and for each i £ B(c), /i'(c) > 0. Now, 
observing that \S C \ < |S C _ 5 |, we get 

\Sc - S c -s\ = |5 C | - \s c n s c - S \ < \s c _s\ - \s c n s c _ 5 | = |5 C _ 5 - s c |, 

and for each i ^ S c _ 5 , 


h'Ac — 8) > max h'Ac — 5). 
* " jeS c _ s -S c J 


Thus, 


G(x ) - G(c) > Y ( h i ( c ~ 6 )- h 'i ( C )) + Y h 'i( c ~ 6 )- Y h i ^ 

i£B(c—5)—S c -s—Sc \ieSc-S c _s i£S c -s~Sc 

> Y {hi (c - 8) - K(c)) + l Y h j( c ~ 6 ) - Y ^( c ) 

i£B(c—5)—S c -s—Sc \i£S c -S c ^ e C ~ S c ieS c -g—S c 

> Y ( ll i ( c ~ 6 )- K(c)) + ( Y h ' j ( C ~~ ^ “ Y h 'i ( c ) 

i£B(c—5)—S c -s—S c \iGS c -s—Sc' 1 c 5 i&S c -s~Sc 

> Y {K{c- 8) -h'(c)) + ( Y h i(c-8)- Y h 'i ( c ) 

i£B{c-8)-S c _s-Sc \ieS c - g -S c ieS c - S -Sc 

= Y i h 'i ( c - 6 ) - h'i(c)) 

i&B(c—5)—S c 

> —ne due to (42) 
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Inequality (a) follows because | S c s ~ S c \ > | S c — | and max je s c i5 _s c h'-{c — 6) < 0. 

h[{c) > 0 for each i £ B(c) and that B{c ) C B(c — 5). 

Thus we have, for any eo = ne , there exists 5 such that 

\x — c\ < 5 =$■ | G(x) — G(c)\ < eo- 

Therefore, G (■) is continuous. 

□ 


F Proof of Proposition 3 

Proof. We first show that gx{x) is non-decreasing. 

For 1 < K < n, define C (x, K) C V as a collection of agents such that \C (, x , K) | = K and 

Kfx) > h'(x), 

for each i G C (x, K) and for each j ^ C (x, K ). 

Let x\,X 2 such that x\ < X 2 - To show that gi<{x) is non-decreasing, we need to show that 
9k{x i) < gi<{x 2 )- Suppose, on the contrary, that gx{x i) > gi<{x 2 ). That is, xam. ke c^ Xl K \ h' k (xq) > 
min keC(x 2 ,K) h' k (X 2 ), by definition of sets C (x\,K) and C (X 2 , K). Letting k\ G argmin keC(xi, K )hk ( Xl ) 
and k 2 G argmin k&C{ X 2 ,K)h'k (^ 2 ), the previous consequence can be rewritten as h' k2 (X 2 ) < h' ki (xi) 
and k\ ^ k 2 - The claim that k± ^ ^2 follows from the fact that h[ is non-decreasing for each i. 

Thus, for each k G C (x\,K), we have 

h'k 2 (X 2 ) < h' kl (x 1 ) < h' k {x 1 ) by definition of k\ 

< h' k (x 2 ) since li' k (•) is non-decreasing (50) 

Thus h' k2 (X 2 ) < h' k (X 2 ) for each k G C (x\, K). By definition of C (X 2 , K), it follows that C (x\,K) C 
C(x 2 ,K). In addition, it must be that Aq ^ C(xi,K); otherwise, by (50) we have h' k {x 2 ) < 
h' k2 (X 2 ), a contradiction. Thus C (x\, K) U {^ 2 } ^ C (X 2 ,K). 

Recall that |C (x, K ) | = K for any x and any 1 < K < n. Then, 

K = \C (x 2 , K) | > \C(x 1 ,K)U{k 2 }\ = \C(x 1 ,K)\ + \{k 2 }\=K + l, (51) 

a contradiction. 

Therefore, gx(x 1 ) < gx{x 2 ) and gx (•) is non-decreasing. 

Recall that each /q(x) is continuously differentiable. Thus function h((x) exists and is continuous, 
i.e., for all i and any e > 0, there exists 5 > 0 such that 

x G (c — 5,c + 5) =>- |h((x) — h'(c)| < e. 

Now we show that function gx (•) is continuous. In particular, we show that V e > 0, 3 d' > 0 such 
that 


x - c| < 6 => \g K ( x) - g K (c) | < e. 


Let k\ G argmin fceC(CiA - ) h4 (c), k 2 G argmin A . eC(c+5jK) h4 (c + 5) and k 3 G argmin fceC(c _ 5 A - ) /i / fc (c - 5). 
We first prove that x < c + 5 =>- gx (x) < gx (c) + e. We consider two cases: (i) h' k (c + 6) > 
h' k2 (c + 5) and (ii) h' ki (c + 5) < h' k2 (c + 5), respectively. 
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Case 1; h! kl (c + 5) > h' k2 (c + 5). 


9k (x) < gx (c + 5) 
= h k2 (c + 8) 

^ ( c + <5) 

< h kl (c) + e 
= 9K (c) + e 


by monotonicity of gx (•) 
by definition of gx (c + (5) 
by assumption 
by continuity of h! k (•) 
by definition of gx (c). 


Case 2: h ki (c + 5) < h' k2 (c + 5). We observe that there exists k* such that k* ^ C (c, K ) but 
k* £ C (c + (5, it). Note that it is possible that k* = k‘ 2 - If this is not true, then for each k £ 
C (c + 8, K ), it also holds that k £ C (c, AT), i.e., C (c + 5, K ) C C (c, it'). On the other hand, we 
know /ci £ C(c,K) but k\ ^ C (c + 8,K), which follows by the assumption of case 2 and the 
definition of Thus we have C (c + 5, it) U {/ci} C C (c, A'). Similar as (51), we will arrive at a 
contradiction, and the claim follows. 

With this observation, we have 


9k (x) < gx (c + 8) 

= K 2 ( c + 

< h' k » (c + 8) 

< h' k * (c) + e 

^ ( c ) + e 
= 9K (c) + e 


by monotonicity of gx (•) 
by definition of 5 *- (c + (5) 
by the fact that k* £ C (c + 8, K) 
by continuity of h! k * (•) 
by the fact that k* ^ C (c, K ) 
by definition of gx (c). 


Thus, we have shown that x < c + 8 => gx (x) < gx (c) + e. 


Now we show that x > c — 8 =$■ gx (x) > gx (c) — e. Recall that A'i £ argmin k £C(c,K)h'k ( c ) 

and ^3 £ aTgmm keC , c _ SK ^h k (c — 5). We consider two cases: (i) h' ki (c — 8) < h' k3 (c — S) and (ii) 

h' k 1 (c — <5) > h k3 (c — 8), respectively. 

Case 1: A fei (c - 5) < (c - 8). 

9K (x) > gx (c - (5) by monotonicity of g K (•) 

= h' k3 (c — 5) by definition of gx (c — (5) 

> h' kl (c — 5) by assumption 

> (c) - e by continuity of /'4, (•) 

= gx (c) — e by definition of gx (c). 

Case 2: (c — 8) > h' k3 (c — 8). If £3 £ C (c, AT), 

.9 r- (x) > gx (c - 5) by monotonicity of g K (•) 

= h k3 (c — 5) by definition of gx (c — <5) 

> (c) — e by continuity of h' k3 (•) 

> /4 >u (c) — e by definition of C (c, A') and the fact that k% £ C (c, A') 

= gx (c) — e by definition of gx (c). 
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If &3 ^ C (c,K), then there exists k* such that k* C(c — <5, AT) and k* £ C(c,K). If this is 
not true, then for each k such that k £ C(c, K ), it also holds that k £ C(c — 5, K), i.e., C (c, K) C 
C (c — 6, K). By assumption, k 3 ^ C (c, AT). Thus, we get C (c, K) U {k^} C C (c — (5, A"). Similar 
as (51), we will arrive at a contradiction, and the claim follows. 

With this observation, we have 

9k (x) > g (c — 5) by nronotonicity of 5 ^ (•) 

= hfc/ (c — (5) by definition of gx (c — (5) 

> /4* (c — 5) by the fact that k* £ C (c — 6, K) 

> h ' k * (c) — e by continuity of h' k * (•) 

> h! kl (c) — e by the fact that k* £ C (c, A") 

= ( 7 *- (c) — e by definition of gx (c) . 

Thus, x > c — 6 =$> gx (x) > gx (c) — e. 

Therefore, we have shown that V e > 0, 3 <5 > 0 such that 

\x - c\ < 5 => | g K ( x ) - g K (c) | < e, 

i.e., (•) is continuous. □ 

G Proof of Lemma 2 

Proof. If there exists x £ R that satisfies equation (21) in Algorithm 2, then the algorithm will 
not return _L. Thus, to prove this lemma, it suffices to show that there exists x £ M that satisfies 
equation (21). Consider the multiset of admissible functions {/ii(x), ^(x), • • • , h n (x)} obtained by 
a non-faulty agent in Step 1 of Algorithm 2. Define Xi = argmax xg R hi(x). Let max X % and min A, 
denote the largest and smallest values in X- L . respectively. Sort the above n functions hi(x) in an 
increasing order of their max Xi values, breaking ties arbitrarily. Let i$ denote the / + 1-th agent in 
this sorted order (i.e., io has the / + 1-th smallest value in the above sorted order). Similarly, sort 
the functions hi{x) in an decreasing order of min X, values, breaking ties arbitrarily. Let jo denote 
the / + 1 -th agent in this sorted order (i.e., jo has the / + 1 -th largest value in the above sorted 
order). 

Consider x\ £ X io and X 2 £ X J0 . By the choice of x\ and the definition of *o, at most / values 
in { h\ (x 1 ), h' 2 (x 1 ), ■ ■ ■ , h' n {x 1 )} can be positive. Recall that 

gx{x ) = K th largest value in the set {h\ (x), h' 2 (x), • • • , h' n (x)}. 

Thus we have gx(x 1 ) < 0 for K = / + 1,..., n. Consequently, 9k (xi) < 0. 

Similarly, it can be shown that gx(x 2 ) > 0 for k = 1,..., n — /, and Y^K=f+\ 9 k (^ 2 ) > 0. 

if lf +1 gx (xi) = 0 or Y^x=f+\ 9 k (^ 2 ) = 0 , then x.\ or X 2 , respectively, satisfy equation 
( 21 ), proving the lemma. 

Let us now consider the case when YHF=f+ 1 9 K ( x i) < 0 anf I YHx=f +1 9k ( x ’ 2 ) > 0. By Proposi¬ 
tion 3, we have that 9k (•) is non-decreasing and continuous. Then it follows that x\ < X 2 , 

and there exists x £ [x\,X 2 \ such that Y^ix=f+i.9K (x) = 0 , i.e., x satisfies equation ( 21 ), proving 
the lemma. 

□ 
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H Proofs of Theorems 6 and 7 

By Lemma 2, we know that Algorithm 2 returns a value in R. Let x be the output of Algorithm 
2 for the set of functions {h\(x), ■ ■ ■ ,h n (x)} gathered in Step 1 of the algorithm. Sort the 

above n functions hi(x) in a non-increasing order of their h((x) values, breaking ties arbitrarily. 
Let F* denote the first / agents in this sorted order (i.e., agents in F* have the / largest values in 
the above sorted order); and let Ff denote the last / agents in this sorted order (i.e., agents in Ff 
have the / smallest values in the above sorted order). 

Denote 7 Z* = V — F* — Ff • We have 

Y h 'i = Y h 'i (®) 

ie n* i£[n]-F*-F* 

n-f 

= Y 9k ^> = 0 by ( 21 ) ( 52 ) 

K=f +1 


The remaining proof is identical to the proof of Theorems 3 and 4 with F\ replaced by F* , and 
F2 replaced by Ff. 

I Proof of Lemma 6 

Proof. Let xi,X2 G Y such that x\ ^ x 2 . By definition of Y, there exist valid functions pi(x) = 
YlicAf a ihi( x ) £ C and p 2 (x) = SjeA f Pihi( x ) £ C such that xi G argmin pi(x) and X 2 G 
argmin P2{x), respectively. Note that it is possible that pi(-) = P2 {•)> an d that pj(-) = p(-) for 
i = 1 or i = 2. 

Given 0 < a < 1, let = axi + (1 — a)x 2 . We consider two cases: 

(i) G argmin pi(x) U argmin p2(x) U argmin p(x), and 

(ii) x a £ argmin p\(x) U argmin P2(x) U argmin p{x) 

Case (i):x a G argmin pi(x)Uargmin p2(x)U argmin p{x) When x a G argmin pi(x)U argmin p2(x)U 
argmin p(x), by definition of Y, we have 

x a G argmin p\(x) U argmin P2^x) U argmin p(x) C Y. 


Thus, x a G Y. 

Case (ii): x a (/ argmin p\{x) U argmin P2(x) U argmin p(x) By symmetry, WLOG, assume that 
x\ < X2- By definition of x a , it holds that x\ < x a < X2- By assumption of case (ii), it must 
be that x a > max (argminpi(x)) and x a < min (argminpi(x)), which imply that p\ (x a ) > 0 and 
P2(x a ) < 0 . 

There are two possibilities for f?{x a ) (the gradient of p(x a )): f 7 (x a ) < 0 or p'{x a ) > 0. Note 
that p'(x a ) 0 because x a 0 argmin p(x). 

When p’(xa) < 0, there exists 0 < C < 1 such that 


C Pi(x a ) + (1 - C) P{x a ) = 0. 
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By definition of pi(x) and p(x), we have 


0 = C p'l(Xa) + (1 - C) p(x a ) = C ( Y a iK( x oc) ) + C 1 “ 0 ( TT7T Y 

\i£Af ) V |y 1 iejv 

= Y + (! - C)j^|^ K(x a ). 


Thus, x a is an optimum of function 


i&M 

Let X be the collection of indices defined by 


Y + ( x “ 0 j^Ff) h i( x )- 


(53) 


1 


X = { i : i £ A f, and ctiC, + (1 — C)tttt > 


1 


}• 


|AA| - 2{\M\-f) 

Next we show that \X\ > \M\ — /. Let X\ be the collection of indices defined by 

1 


X\ = { i : i £ AT, and a* > 


m\ - f) 


}■ 


Since p±(x) € C, then \X\\ > \AT\ — f. In addition, since n > 3/, |7V| < 2(|A/"| — /). Then, for each 
j E X\, we have 


aiC + (l-C)T77T >Ct ' 


1 


W\ ~ 2(|AA| — /) 


+ (! — Otttt > C 


1 


l-AA| 2(|A/’| — /) 


+ (1-C) 


1 


1 


m\-n m\-fy 


i.e., j € X. Thus, X\ C X. 

Since \X\ \ > |7V| — /, we have \X\ > |A/"| — /. So function (53) is a valid function. Thus, x a € Y. 

Similarly, we can show that the above result holds when p 1 (x a ) > 0. 

Therefore, set Y is convex. 

□ 


J Proof of Lemma 7 

Proof. Let 1Z* [t — 1] denote the set of nodes from whom the remaining n — 2/ values were received 
in iteration t, and let us denote by C[t — 1] and S[t — 1] the set of nodes from whom the largest 
/ values and the smallest / values were received in iteration t. Due to the fact that each value is 
transmitted using Byzantine broadcast, TZ*[t — 1], C[t — 1] and S[t — 1] do not depend on j, for 
j £ A T. Thus, 77* [t — 1], C[t — 1] and S[t — 1] are well-defined. 

Let i *, j* £ 77* [t — 1] such that g,* [7 — 1] = g[t — 1] and gj* [7 — 1] = g[t — 1]. Recall that | T\ = <f>. 
Let C*[t — 1] C C[t — 1] — T and S*[t — 1] C S[t — 1] — T such that 

|£*[7 —l]| = /-0+|77*[7-l]nn 

and 

|5*[7-1]| = / — <£+ |77*[7—l]nJ-|. 

We consider two cases: (i) g [7 — 1] > <?[7 — 1] and (ii) g[t — 1] = <?[7 — 1]. 
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Case (i): g[t — 1] > g[t — 1]. By definition of C*[t — 1] and S*[t — 1], we have 

+ - !] < s\t - i] < / _^ + | K . |( _ 1 ] n^| * 1] ' 

Thus, there exists 0 < £ < 1 such that 

3[t " 11 = 4 (/-^+|K*[t-i]nU “ 1] ) + (1 " C) (/-* + |R'[t-i]n^JC_ 

- + \n*[t- 1] n J 7 ! S ^ ^ + /-</. + | 7 e*[t- l] nJ 7 ! ^ ^ 

11 J 1 *e5*[t-i] 1 1 J 1 jec*[t- 1] 

(54) 


By symmetry, WLOG, assume £ > 


Let k € 7£* [i — 1] — J 7 . By symmetry, WLOG, assume </*[< — 1] < g[f — 1]. Since |£[t— 1] U{j*}| = 
/ + 1, there exists a non-faulty agent j' k € C[t — 1] U {j*}- Thus, gy [f — 1] > g[t — 1] > g[t — 1], and 
there exists 0 < < 1 such that 

\ (d[t - 1] +g[t - 1]) = g[t - 1] = tk9k[t - 1] + (1 - €k)9j' k [t - !]• (55) 


Since g[t — 1] < g^\t — 1] and gj> [t — 1] > g[t — 1], it can be shown that \ — Ck < 1- Therefore, we 
have 




|AT| - r 
1 


W\-f 

1 


Y g[t - i] J + 


f -<f> + \R*[t - l] n J 7 ] 
W\-f 


g[t ~!] 


M-/ 

+ 


keTl*[t-l]-T 

(ik9k[t - 1] + (1 - Zk)9j’ h [t - 1]) 

k&n*[t-l}-T 

t ^ ,1-C 


w\-f 


Y 9i[t-l\ + 


ie5*[t-l] 


w\-f 


Y 9j[t-i\- 


(56) 


jec*[t- i] 


In (56), for each k € 1Z*[t — 1] — J 7 , it holds that • Tor eac ^ fe € «S*[t — 1], it holds 

that Y\-f — 2 LATP 7 • addition, we have 


| (n*[t -1] - j=) u {s*[t -1]) | = | n*[t -1] - j 7 ! +1 s*[t -1]| 

= \n*[t —1]| — | n*[t - i] n f\ + |s*[t -1]| 

= n - 2/ - \n*[t -1] n t\ + / - 0 + \n*[t -1] n t\ 

= n-(j)- f = \N\- f. 


Thus, in (56), at least |A/"| — / non-faulty agents are assigned with weights lower bounded by 
2 (| • Thus g[t — 1] is a gradient of a valid function in C at x[t — 1], i.e., there exists p(x) € C 
such that g[t — 1] = p'(x[t — 1]). 


9j [* - 1] 
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Case (ii): g[t — 1] = g[t — 1]. Let k € TZ*[t — 1] — T. Since g[t — 1] > g^t — 1] > g[t — 1] and 
g[t — 1] = g[t — 1], it holds that g[t — 1] = gi\t — 1] = g[t — 1]. Consequently, we have 


- !] = 2 ~ !] + <?[* “ !]) = 9k[t ~ !]• 


So we can rewrite g[t — 1] as follows. 

aft- i] = \^4g[t-i] 


m-r 

i 

: W[^l 

1 

: M 7 / 

t 

+ 








1]—J 7 


M-/ 


X gi ft - + 


i-e 


iGS*[t—1] 




l-AT| - / 




(57) 


Thus, in (57), at least |A7| — / non-faulty agents are assigned with weights lower bounded by 
2(|a7|— /) ’ Thus g[t — 1] is a gradient of a valid function in C at x[t — 1], i.e., there exists p(x) 6 C 
such that g[t — 1] = p'(x[t — 1]). 

Case (i) and Case (ii) together proves the lemma. 

□ 









